zhangjun0x01 opened a new issue #1666:
URL: https://github.com/apache/iceberg/issues/1666


   I found that the value stored in the variable fileSizeInBytes of DataFile, 
orc and parquet format are inconsistent. The orc format stores the deserialized 
data size, while the parquet stores the file size.
   
   This will cause a problem. In RewriteDataFilesAction, the default value of 
the targetSizeInBytes is 128M,if it is  orc format, , after rewrite action,the 
size of the datafile is only 10M. Because in RewriteDataFilesAction ,we read 
the orc data according to the deserialized data size ,not the file size ,so  
the size of the new generated datafile is not enough to 128M.
   
   The parquet format is normal and meets my expectations.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to