haydenflinner commented on issue #2040:
URL: https://github.com/apache/iceberg/issues/2040#issuecomment-1414446899
Same thing here, happening whether I use INSERT INTO or the dataframe API.
How annoying. Is there really no solution besides messing with the dataframe
schema to ensure it has the same number of columns as the iceberg table? Seems
annoying to evolve the table this way.
```
# spark.sql(
# f"""CREATE OR REPLACE TEMPORARY VIEW myview USING parquet
# OPTIONS (path "{path}")"""
# )
# log.info("calling-insert")
# spark.sql(f"INSERT INTO {tablename}({', '.join(df.columns)}) SELECT *
FROM myview")
---> leads to:
# Table columns: 'server_name', 'backed_up_path', 'backed_up_filesize',
'num_lines', 'backed_up_ts', 'start_ts', 'end_ts', 'first_x_ts', 'last_x_ts'
# Data columns: 'server_name', 'backed_up_path', 'backed_up_filesize'
```
or
```
df.to_parquet(path, index=False, allow_truncated_timestamps=True,
coerce_timestamps='us')
spark = _get_spark()
spark.sql("use dev_catalog")
sdf = spark.read.parquet(path)
sdf.writeTo(f"dev_catalog.{tablename}").append()
--> leads to:
AnalysisException: Cannot write incompatible data to table
'dev_catalog.logfiles':
- Cannot write nullable values to non-null column 'backed_up_path'
- Cannot find data for output column 'num_lines'
- Cannot find data for output column 'backed_up_ts'
- Cannot find data for output column 'start_ts'
- Cannot find data for output column 'end_ts'
- Cannot find data for output column 'first_x_ts'
- Cannot find data for output column 'last_x_ts'
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]