[
https://issues.apache.org/jira/browse/BEAM-9743?focusedWorklogId=423874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423874
]
ASF GitHub Bot logged work on BEAM-9743:
----------------------------------------
Author: ASF GitHub Bot
Created on: 17/Apr/20 02:15
Start Date: 17/Apr/20 02:15
Worklog Time Spent: 10m
Work Description: lukemin89 commented on pull request #11397: [BEAM-9743]
Fix TFRecordCodec to try harder to read/write
URL: https://github.com/apache/beam/pull/11397#discussion_r409952691
##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java
##########
@@ -717,14 +715,38 @@ public void write(WritableByteChannel outChannel, byte[]
data) throws IOExceptio
header.clear();
header.putLong(data.length).putInt(maskedCrc32OfLength);
header.rewind();
- outChannel.write(header);
+ writeFully(outChannel, header);
- outChannel.write(ByteBuffer.wrap(data));
+ writeFully(outChannel, ByteBuffer.wrap(data));
footer.clear();
footer.putInt(maskedCrc32OfData);
footer.rewind();
- outChannel.write(footer);
+ writeFully(outChannel, footer);
+ }
+
+ @VisibleForTesting
+ static void readFully(ReadableByteChannel in, ByteBuffer bb) throws
IOException {
+ int expected = bb.remaining();
+ int actual = read(in, bb);
+ if (expected != actual) {
+ throw new IOException(String.format("expected %d, but got %d",
expected, expected));
+ }
+ }
+
+ private static int read(ReadableByteChannel in, ByteBuffer bb) throws
IOException {
+ int n, read = 0;
+ while (bb.hasRemaining() && (n = in.read(bb)) >= 0) {
+ read += n;
+ }
+ return read;
+ }
+
+ @VisibleForTesting
+ static void writeFully(WritableByteChannel channel, ByteBuffer buffer)
throws IOException {
+ while (buffer.hasRemaining()) {
+ channel.write(buffer);
+ }
Review comment:
Thanks for the confirmation :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 423874)
Time Spent: 2h 50m (was: 2h 40m)
> TFRecordCodec not attempt to fully read/write
> ---------------------------------------------
>
> Key: BEAM-9743
> URL: https://issues.apache.org/jira/browse/BEAM-9743
> Project: Beam
> Issue Type: Bug
> Components: io-java-tfrecord, sdk-java-core
> Reporter: Kyoungha Min
> Assignee: Kyoungha Min
> Priority: Critical
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> The same issue has been pointed out and the issues were marked resolved. But
> they were still remaining parts....
> https://issues.apache.org/jira/browse/BEAM-5412?jql=text%20~%20%22tfrecord%22
>
> Issue # 1: TFRecordCodec only tries once to read the header/footer. This is
> likely to fail around the end of channel buffer.
> Issue # 2: (minor) TFRecordCodec currently does not checks how much it
> writes.
>
> Seems like it only happens with Zstd compression (or any other picky input
> stream that refuse to read fully). ZstdInputStream seems very picky at giving
> out data.
> The parts with the issue are
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672]
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699]
>
> And not so problem within the beam application (As all (or most) of
> WritableByteChannels in beam-java-sdk-core are backed by some OutputStream),
> but still not following the WritableByteChannel specification,
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727]
>
> ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not
> required to read/write fully, and can refuse to read/write time to time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)