[
https://issues.apache.org/jira/browse/HUDI-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921920#comment-16921920
]
leesf commented on HUDI-140:
----------------------------
Fixed via master: 0b451b3a58cabe25c0cecd3fd8847a8597e2313a
> GCS: Log File Reading not working due to difference in seek() behavior for EOF
> ------------------------------------------------------------------------------
>
> Key: HUDI-140
> URL: https://issues.apache.org/jira/browse/HUDI-140
> Project: Apache Hudi (incubating)
> Issue Type: Bug
> Components: Realtime View
> Reporter: BALAJI VARADARAJAN
> Assignee: BALAJI VARADARAJAN
> Priority: Major
> Labels: gcs-parity, pull-request-available, usability
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Issue:
> Caused by: java.io.EOFException: Invalid seek offset: position value
> (1370518) must be between 0 and 1370518
> at
> com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.validatePosition(GoogleCloudStorageReadChannel.java:644)
> at
> com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.position(GoogleCloudStorageReadChannel.java:558)
> at
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.seek(GoogleHadoopFSInputStream.java:309)
> at
> org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
> at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
> at
> com.uber.hoodie.common.table.log.block.HoodieLogBlock.readOrSkipContent(HoodieLogBlock.java:234)
> at
> com.uber.hoodie.common.table.log.HoodieLogFileReader.createCorruptBlock(HoodieLogFileReader.java:230)
> at
> com.uber.hoodie.common.table.log.HoodieLogFileReader.readBlock(HoodieLogFileReader.java:149)
> at
> com.uber.hoodie.common.table.log.HoodieLogFileReader.next(HoodieLogFileReader.java:352)
>
> “””
> _Status_: The issue turned out to be because of difference in
> GCSHadoopFileSystem's and HDFSFileSystem's implementation of
> Inputstream.seek behavior for handling EOF. This is causing log block reading
> for GCS to treat a valid last block as corrupt. Given a quick fix to Alex to
> try it out. Needs discussion with Hudi dev to figure out a proper solution
--
This message was sent by Atlassian Jira
(v8.3.2#803003)