[GitHub] [iceberg] ZhendongBai opened a new issue, #7775: The Orc file (via iceberg write)because large than Orc file(spark write) ?

via GitHub Mon, 05 Jun 2023 22:30:24 -0700


ZhendongBai opened a new issue, #7775:
URL: https://github.com/apache/iceberg/issues/7775


   ### Query engine
   
   spark sql 3.3.2
   
   ### Question
   
   when I use spark 3.3.2，write the same table into hive partitioned 
table(table A) and iceberg(metadata store in hive) partitioned table(table B), 
both table are orc format and have same compression strategy, I do following 
test:
   1. first create an iceberg table(table C) like hive table and add table A 
into table C (via spark add_files procedure)
   2. compare table C and table B via select ".data_files" metadata 
table(column_sizes field, I extract the field size via field-id), the result 
show as below:
   <img width="790" alt="image" 
src="https://github.com/apache/iceberg/assets/18043146/03311e28-e311-47fb-85d3-930b0f5bf435";>
   why the iceberg table string fields bytes size bigger than spark sql fields 
bytes size ? especially, map<string, string> and string type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ZhendongBai opened a new issue, #7775: The Orc file (via iceberg write)because large than Orc file(spark write) ?

Reply via email to