[I] How does hive.exec.orc.default.buffer.size affect the file size? [orc]

via GitHub Wed, 17 Jul 2024 04:36:09 -0700


loukey-lj opened a new issue, #1985:
URL: https://github.com/apache/orc/issues/1985


   i write orc file use spark sql 3.3。
   I noticed that in the production environment, many ORC files had small 
stripe sizes. So, I decided to adjust the value of 
hive.exec.orc.default.buffer.size from 256K to 1K. I observed a significant 
increase in the stripe size, and the number of stripes in a single file 
decreased significantly. Unexpectedly, I found that the file size generated 
with the same dataset was different for the two parameter values. The final 
file size with hive.exec.orc.default.buffer.size set to 1K was twice the size 
of 256K.
   
   Generally, when the stripe size increases, we would expect the compression 
ratio to be higher. However, it is surprising that reducing the buffer size 
affects the final file size.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] How does hive.exec.orc.default.buffer.size affect the file size? [orc]

Reply via email to