RussellSpitzer opened a new issue #2244: URL: https://github.com/apache/iceberg/issues/2244
Spark currently possess no readers for the Timestamp.withoutZone() type but is able to create tables with this schema. These tables if you attempt to read them from Spark will error out. Currently in master the only reader for this type that I can see is for the generic case https://github.com/apache/iceberg/blob/631efec2f9ce1f0526d6613c81ac8fd0ccb95b5e/data/src/main/java/org/apache/iceberg/data/orc/GenericOrcReader.java#L116-L117 Other systems if they hit a table with this type will fail immediately since they do not have valid readers. This is a bit troubling because this column type is used by default when non-iceberg ORC writers make new files, for example: ```scala spark.sql("CREATE EXTERNAL TABLE mytable (foo timestamp) location '/Users/russellspitzer/Temp/foo'") spark.sql("INSERT INTO mytable VALUES (now())") ``` Creates files ``` File Version: 0.12 with ORC_135 Rows: 1 Compression: SNAPPY Compression size: 262144 Calendar: Julian/Gregorian Type: struct<foo:timestamp> ``` The non-iceberg Spark and Hive Orc readers and writers have no problem dealing with these files but, if an iceberg table is created and these files are added to it then they are unreadable by Iceberg's orc readers and writers. There is also a related problem with Migrate -- @RussellSpitzer Add Link Here ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
