loukey-lj opened a new issue, #1985: URL: https://github.com/apache/orc/issues/1985
i write orc file use spark sql 3.3。 I noticed that in the production environment, many ORC files had small stripe sizes. So, I decided to adjust the value of hive.exec.orc.default.buffer.size from 256K to 1K. I observed a significant increase in the stripe size, and the number of stripes in a single file decreased significantly. Unexpectedly, I found that the file size generated with the same dataset was different for the two parameter values. The final file size with hive.exec.orc.default.buffer.size set to 1K was twice the size of 256K. Generally, when the stripe size increases, we would expect the compression ratio to be higher. However, it is surprising that reducing the buffer size affects the final file size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
