cxzl25 commented on issue #1939:
URL: https://github.com/apache/orc/issues/1939#issuecomment-2658155742

   
   I discussed it offline with @tomscut yesterday and confirmed that it was a 
problem that occurred when reading ROW_INDEX. We tried to use `orc-tools meta 
--rowindex X`, and the tool will also report an error.
   
   `StripePlanner.readRowIndex(StripePlanner.java:404)`
   
   
https://github.com/apache/orc/blob/310fb43fc69a0942f46dd4de3f648727d984e535/java/core/src/java/org/apache/orc/impl/reader/StripePlanner.java#L403-L404
   
   ---
   
   Here, we may use `spark.sql.orc.filterPushdown=false` to avoid reading 
`ROW_INDEX` for recovery of data in Spark.
   
   As for why the data is corrupted, it is still suspected that it is caused by 
HDFS EC.
   
   cc @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to