[GitHub] [iceberg] vikrambohra edited a comment on issue #2456: Cannot write nullable values to non-null column

GitBox Wed, 02 Mar 2022 13:29:58 -0800


vikrambohra edited a comment on issue #2456:
URL: https://github.com/apache/iceberg/issues/2456#issuecomment-1049275374



   java.lang.IllegalArgumentException: Cannot write incompatible dataset to 
table with schema:
   table {
     1: header: required struct<11: memberId: required int (The LinkedIn member 
ID of the user initiating the action.  LinkedIn member IDs are integers greater 
than zero.  Guests are represented either as zero or a negative number.), 12: 
viewerUrn: optional string (The LinkedIn URN of the user initiating the action. 
 For other applications like Slideshare, this should be filled ...
     2: requestHeader: required struct<59: browserId: optional string (The 
browserId stored within the user's bcookie.  For information on the bcookie 
format from which browserId is derived, see: ...
     3: mobileHeader: optional struct<73: osName: optional string (The name of 
the operating system.), 74: osVersion: optional string (The version of the 
operating system.), 75: deviceModel: optional string (The model of the 
device.), ..
     4: pageType: required string (A flag which specifies what type of page 
this is.)
     5: errorMessageKey: optional string (A unique identifier for the error 
message shown.)
     6: trackingCode: optional string (DEPRECATED. A key for the linkedin page 
that referred this view)
     7: trackingInfo: required map<string, string> (DEPRECATED. Misc fields 
supplied by the page)
     8: totalTime: required int (The total server-side time required to render 
the page in ms)
     9: datepartition: optional string
     10: late: optional int
   }
   write schema:table {
     1: header: optional struct<11: memberId: optional int (The LinkedIn member 
ID of the user initiating the action.  LinkedIn member IDs are integers greater 
than zero.  Guests are represented either as zero or a negative number.), 12: 
viewerUrn: optional string (The LinkedIn URN of the user initiating the action. 
 For other applications like Slideshare, this should be filled ...
     2: requestHeader: optional struct<59: browserId: optional string (The 
browserId stored within the user's bcookie.  For information on the bcookie 
format from which browserId is derived, see: ...
     3: mobileHeader: optional struct<73: osName: optional string (The name of 
the operating system.), 74: osVersion: optional string (The version of the 
operating system.), 75: deviceModel: optional string (The model of the 
device.),  ...
     4: pageType: optional string (A flag which specifies what type of page 
this is.)
     5: errorMessageKey: optional string (A unique identifier for the error 
message shown.)
     6: trackingCode: optional string (DEPRECATED. A key for the linkedin page 
that referred this view)
     7: trackingInfo: optional map<string, string> (DEPRECATED. Misc fields 
supplied by the page)
     8: totalTime: optional int (The total server-side time required to render 
the page in ms)
     9: datepartition: optional string
     10: late: optional int
   }
   Problems:
   * header.traceData.context: values should be required, but are optional
   * trackingInfo: values should be required, but are optional
   at org.apache.iceberg.types.TypeUtil.validateWriteSchema(TypeUtil.java:263)
        at 
org.apache.iceberg.spark.source.IcebergSource.createWriter(IcebergSource.java:95)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:255)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:226)
   
   Context:
   I have a source iceberg table with both optional and required fields in 
schema. I read the source table incrementally in spark and dedupe the data 
using some columns as keys. I need to write the back to another iceberg table, 
however I want to reduce the number of output files. So I write the data to 
HDFS using the spark orc writer (df.write.format("orc").save(path). I read it 
back using spark.read.format("orc").load(path) with some filter and try to 
write to the destination iceberg table which has the same schema as the source 
iceberg table. This is where it fails with the above exception. I checked the 
dataframe schema after reading back from HDFS and I see all fields as 
optional/nullable.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] vikrambohra edited a comment on issue #2456: Cannot write nullable values to non-null column

Reply via email to