[ 
https://issues.apache.org/jira/browse/HUDI-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921920#comment-16921920
 ] 

leesf commented on HUDI-140:
----------------------------

Fixed via master: 0b451b3a58cabe25c0cecd3fd8847a8597e2313a

> GCS: Log File Reading not working due to difference in seek() behavior for EOF
> ------------------------------------------------------------------------------
>
>                 Key: HUDI-140
>                 URL: https://issues.apache.org/jira/browse/HUDI-140
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: Realtime View
>            Reporter: BALAJI VARADARAJAN
>            Assignee: BALAJI VARADARAJAN
>            Priority: Major
>              Labels: gcs-parity, pull-request-available, usability
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Issue:
> Caused by: java.io.EOFException: Invalid seek offset: position value 
> (1370518) must be between 0 and 1370518
>     at 
> com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.validatePosition(GoogleCloudStorageReadChannel.java:644)
>     at 
> com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.position(GoogleCloudStorageReadChannel.java:558)
>     at 
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.seek(GoogleHadoopFSInputStream.java:309)
>     at 
> org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
>     at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>     at 
> com.uber.hoodie.common.table.log.block.HoodieLogBlock.readOrSkipContent(HoodieLogBlock.java:234)
>     at 
> com.uber.hoodie.common.table.log.HoodieLogFileReader.createCorruptBlock(HoodieLogFileReader.java:230)
>     at 
> com.uber.hoodie.common.table.log.HoodieLogFileReader.readBlock(HoodieLogFileReader.java:149)
>     at 
> com.uber.hoodie.common.table.log.HoodieLogFileReader.next(HoodieLogFileReader.java:352)
>  
> “””
> _Status_: The issue turned  out to be because of difference in 
> GCSHadoopFileSystem's  and HDFSFileSystem's implementation of 
> Inputstream.seek behavior for handling EOF. This is causing log block reading 
> for GCS to treat a valid last block as corrupt. Given a quick fix to Alex to 
> try it out. Needs discussion with Hudi dev to figure out a proper solution



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to