Re: [PR] ORC-1866: Avoid zlib decompression infinite loop [orc]

via GitHub Wed, 08 Oct 2025 07:53:57 -0700


dushyantk1509 commented on PR #2127:
URL: https://github.com/apache/orc/pull/2127#issuecomment-3381939703


   Thanks @cxzl25 for quick response!
   
   Are you aware of any other scenario this could happen? In our scenario, 
spark app has written 4000 files but only one is corrupted. I tried reading the 
file, I was able to read till 350208 records but stuck after it. From the file 
stats, it has 3M+ rows. IMO, this seems like some hardware failure similar to 
your scenario. B/w how did you confirm on the hardware failure? 
   > For example, it may be some data corruption caused by the HDFS EC storage 
policy, causing it to fail to decompress.
   
   Is it possible for you to share code snippet on how did you adjust buffer 
size? Any code reference that you might have?
   > I created this damaged file because I adjusted the buffer size when writing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] ORC-1866: Avoid zlib decompression infinite loop [orc]

Reply via email to