omalley opened a new pull request #582:
URL: https://github.com/apache/orc/pull/582


   ### What changes were proposed in this pull request?
   
   This PR updates the scan tool to print information about where the file is 
corrupted. It
   * reads data by batches until there is a problem
   * tries re-reading that batch column by column to find which column is 
corrupted
   * figures out the next location that the reader can seek to
   
   ### Why are the changes needed?
   
   It helps diagnose where (row & column) an ORC file is corrupted.
   
   ### How was this patch tested?
   
   It was tested on ORC files that were corrupted by bad machines.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to