vikrambohra edited a comment on issue #2456:
URL: https://github.com/apache/iceberg/issues/2456#issuecomment-1049275374
java.lang.IllegalArgumentException: Cannot write incompatible dataset to
table with schema:
table {
1: header: required struct<11: memberId: required int (The LinkedIn member
ID of the user initiating the action. LinkedIn member IDs are integers greater
than zero. Guests are represented either as zero or a negative number.), 12:
viewerUrn: optional string (The LinkedIn URN of the user initiating the action.
For other applications like Slideshare, this should be filled ...
2: requestHeader: required struct<59: browserId: optional string (The
browserId stored within the user's bcookie. For information on the bcookie
format from which browserId is derived, see: ...
3: mobileHeader: optional struct<73: osName: optional string (The name of
the operating system.), 74: osVersion: optional string (The version of the
operating system.), 75: deviceModel: optional string (The model of the
device.), ..
4: pageType: required string (A flag which specifies what type of page
this is.)
5: errorMessageKey: optional string (A unique identifier for the error
message shown.)
6: trackingCode: optional string (DEPRECATED. A key for the linkedin page
that referred this view)
7: trackingInfo: required map<string, string> (DEPRECATED. Misc fields
supplied by the page)
8: totalTime: required int (The total server-side time required to render
the page in ms)
9: datepartition: optional string
10: late: optional int
}
write schema:table {
1: header: optional struct<11: memberId: optional int (The LinkedIn member
ID of the user initiating the action. LinkedIn member IDs are integers greater
than zero. Guests are represented either as zero or a negative number.), 12:
viewerUrn: optional string (The LinkedIn URN of the user initiating the action.
For other applications like Slideshare, this should be filled ...
2: requestHeader: optional struct<59: browserId: optional string (The
browserId stored within the user's bcookie. For information on the bcookie
format from which browserId is derived, see: ...
3: mobileHeader: optional struct<73: osName: optional string (The name of
the operating system.), 74: osVersion: optional string (The version of the
operating system.), 75: deviceModel: optional string (The model of the
device.), ...
4: pageType: optional string (A flag which specifies what type of page
this is.)
5: errorMessageKey: optional string (A unique identifier for the error
message shown.)
6: trackingCode: optional string (DEPRECATED. A key for the linkedin page
that referred this view)
7: trackingInfo: optional map<string, string> (DEPRECATED. Misc fields
supplied by the page)
8: totalTime: optional int (The total server-side time required to render
the page in ms)
9: datepartition: optional string
10: late: optional int
}
Problems:
* header.traceData.context: values should be required, but are optional
* trackingInfo: values should be required, but are optional
at org.apache.iceberg.types.TypeUtil.validateWriteSchema(TypeUtil.java:263)
at
org.apache.iceberg.spark.source.IcebergSource.createWriter(IcebergSource.java:95)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:255)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:226)
Context:
I have a source iceberg table with both optional and required fields in
schema. I read the source table incrementally in spark and dedupe the data
using some columns as keys. I need to write the back to another iceberg table,
however I want to reduce the number of output files. So I write the data to
HDFS using the spark orc writer (df.write.format("orc").save(path). I read it
back using spark.read.format("orc").load(path) with some filter and try to
write to the destination iceberg table which has the same schema as the source
iceberg table. This is where it fails with the above exception. I checked the
dataframe schema after reading back from HDFS and I see all fields as
optional/nullable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]