[ 
https://issues.apache.org/jira/browse/BEAM-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9743:
-------------------------------
    Status: Open  (was: Triage Needed)

> TFRecordCodec not attempt to fully read/write
> ---------------------------------------------
>
>                 Key: BEAM-9743
>                 URL: https://issues.apache.org/jira/browse/BEAM-9743
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-tfrecord, sdk-java-core
>            Reporter: Kyoungha Min
>            Assignee: Kyoungha Min
>            Priority: Critical
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The same issue has been pointed out and the issues were marked resolved. But 
> they were still remaining parts....
> https://issues.apache.org/jira/browse/BEAM-5412?jql=text%20~%20%22tfrecord%22
>  
> Issue # 1: TFRecordCodec only tries once to read the header/footer. This is 
> likely to fail around the end of channel buffer.  
> Issue # 2: (minor) TFRecordCodec currently does not checks how much it 
> writes. 
>  
> Seems like it only happens with Zstd compression (or any other picky input 
> stream that refuse to read fully). ZstdInputStream seems very picky at giving 
> out data.
> The parts with the issue are
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672]
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699]
>  
> And not so problem within the beam application (As all (or most) of 
> WritableByteChannels in beam-java-sdk-core are backed by some OutputStream), 
> but still not following the WritableByteChannel specification, 
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727]
>  
> ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not 
> required to read/write fully, and can refuse to read/write time to time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to