[
https://issues.apache.org/jira/browse/BEAM-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kyoungha Min updated BEAM-9743:
-------------------------------
Component/s: io-java-tfrecord
> TFRecordCodec not attempt to fully read/write
> ---------------------------------------------
>
> Key: BEAM-9743
> URL: https://issues.apache.org/jira/browse/BEAM-9743
> Project: Beam
> Issue Type: Bug
> Components: io-java-tfrecord, sdk-java-core
> Reporter: Kyoungha Min
> Assignee: Kyoungha Min
> Priority: Critical
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> The same issue has been pointed out and the issues were marked resolved. But
> they were still remaining parts....
> https://issues.apache.org/jira/browse/BEAM-5412?jql=text%20~%20%22tfrecord%22
>
> Issue # 1: TFRecordCodec only tries once to read the header/footer. This is
> likely to fail around the end of channel buffer.
> Issue # 2: (minor) TFRecordCodec currently does not checks how much it
> writes.
>
> Seems like it only happens with Zstd compression (or any other picky input
> stream that refuse to read fully). ZstdInputStream seems very picky at giving
> out data.
> The parts with the issue are
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672]
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699]
>
> And not so problem within the beam application (As all (or most) of
> WritableByteChannels in beam-java-sdk-core are backed by some OutputStream),
> but still not following the WritableByteChannel specification,
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727]
>
> ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not
> required to read/write fully, and can refuse to read/write time to time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)