RussellSpitzer opened a new issue #2244:
URL: https://github.com/apache/iceberg/issues/2244


   Spark currently possess no readers for the Timestamp.withoutZone() type but 
is able to create tables with this schema. These tables if you attempt to read 
them from Spark will error out.
   
   Currently in master the only reader for this type that I can see is for the 
generic case
   
https://github.com/apache/iceberg/blob/631efec2f9ce1f0526d6613c81ac8fd0ccb95b5e/data/src/main/java/org/apache/iceberg/data/orc/GenericOrcReader.java#L116-L117
   
   Other systems if they hit a table with this type will fail immediately since 
they do not have valid readers.
   
   This is a bit troubling because this column type is used by default when 
non-iceberg ORC writers make new files, for example:
   
   ```scala
   spark.sql("CREATE EXTERNAL TABLE mytable (foo timestamp) location 
'/Users/russellspitzer/Temp/foo'")
   
spark.sql("INSERT INTO mytable VALUES (now())")

   ```
   
   Creates files 
   ```
   File Version: 0.12 with ORC_135
   Rows: 1
   Compression: SNAPPY
   Compression size: 262144
   Calendar: Julian/Gregorian
   Type: struct<foo:timestamp>
   ```
   
   The non-iceberg Spark and Hive Orc readers and writers have no problem 
dealing with these files but, if an iceberg table is created and these files 
are added to it then they are unreadable by Iceberg's orc readers and writers.
   
   There is also a related problem with Migrate -- @RussellSpitzer Add Link Here
   
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to