[GitHub] [iceberg] rdblue commented on issue #1257: Support vectorized reads with identity transforms

GitBox Thu, 30 Jul 2020 00:29:19 -0700


rdblue commented on issue #1257:
URL: https://github.com/apache/iceberg/issues/1257#issuecomment-665932497



   I think @shardulm94 is right. Iceberg will write all data columns into every 
file, unlike Hive that will leave partition columns out of the data. The reason 
why we write the columns is that we may want to move the files to a different 
partition spec later (e.g., drop a categorical column, move the files, then 
compact).
   
   So this is probably working because the data files in the tests actually do 
contain the data columns. What you can do to make the tests fail is generate 
the data files without the identity-partitioned columns, add them to a table, 
then validate that you still get those columns when you read. That's basically 
what happens when we import Hive data into Iceberg tables because the columns 
aren't present in the data files and we use the values from the file's 
partition tuple.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on issue #1257: Support vectorized reads with identity transforms

Reply via email to