nsivabalan commented on issue #4835:
URL: https://github.com/apache/hudi/issues/4835#issuecomment-1047299256


   reading the block footer should be safe from what I can infer from code. 
   ```
   private boolean isBlockCorrupt(int blocksize) throws IOException {
       long currentPos = inputStream.getPos();
       try {
         inputStream.seek(currentPos + blocksize);
       } catch (EOFException e) {
         LOG.info("Found corrupted block in file " + logFile + " with block 
size(" + blocksize + ") running past EOF");
         // this is corrupt
         // This seek is required because contract of seek() is different for 
naked DFSInputStream vs BufferedFSInputStream
         // release-3.1.0-RC1/DFSInputStream.java#L1455
         // release-3.1.0-RC1/BufferedFSInputStream.java#L73
         inputStream.seek(currentPos);
         return true;
       }
   
       // check if the blocksize mentioned in the footer is the same as the 
header; by seeking back the length of a long
       // the backward seek does not incur additional IO as {@link 
org.apache.hadoop.hdfs.DFSInputStream#seek()}
       // only moves the index. actual IO happens on the next read operation
       inputStream.seek(inputStream.getPos() - Long.BYTES);
       // Block size in the footer includes the magic header, which the header 
does not include.
       // So we have to shorten the footer block size by the size of magic hash
       long blockSizeFromFooter = inputStream.readLong() - magicBuffer.length;
       if (blocksize != blockSizeFromFooter) {
         LOG.info("Found corrupted block in file " + logFile + ". Header block 
size(" + blocksize
                 + ") did not match the footer block size(" + 
blockSizeFromFooter + ")");
         inputStream.seek(currentPos);
         return true;
       }
   ```
   
   We basically, first try to seek to end of block 
   inputStream.seek(currentPos + blocksize);
   
   and if it succeeds, only then we proceed onto read the size from footer. So, 
I don't see how come there is no exception when executing 
inputStream.seek(currentPos + blockSize), but hitting an exception while 
reading footer. 
   very strange. 
   
   @danny0405 @bhasudha @bvaradar @n3nash : Can you folks think of a reason for 
this exception.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to