[
https://issues.apache.org/jira/browse/BEAM-9743?focusedWorklogId=420723&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420723
]
ASF GitHub Bot logged work on BEAM-9743:
----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Apr/20 09:38
Start Date: 11/Apr/20 09:38
Worklog Time Spent: 10m
Work Description: lukemin89 commented on pull request #11397: [BEAM-9743]
Fix TFRecordCodec to try harder to read/write
URL: https://github.com/apache/beam/pull/11397#discussion_r407041552
##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java
##########
@@ -717,14 +715,38 @@ public void write(WritableByteChannel outChannel, byte[]
data) throws IOExceptio
header.clear();
header.putLong(data.length).putInt(maskedCrc32OfLength);
header.rewind();
- outChannel.write(header);
+ writeFully(outChannel, header);
- outChannel.write(ByteBuffer.wrap(data));
+ writeFully(outChannel, ByteBuffer.wrap(data));
footer.clear();
footer.putInt(maskedCrc32OfData);
footer.rewind();
- outChannel.write(footer);
+ writeFully(outChannel, footer);
+ }
+
+ @VisibleForTesting
+ static void readFully(ReadableByteChannel in, ByteBuffer bb) throws
IOException {
+ int expected = bb.remaining();
+ int actual = read(in, bb);
+ if (expected != actual) {
+ throw new IOException(String.format("expected %d, but got %d",
expected, expected));
+ }
+ }
+
+ private static int read(ReadableByteChannel in, ByteBuffer bb) throws
IOException {
+ int n, read = 0;
+ while (bb.hasRemaining() && (n = in.read(bb)) >= 0) {
+ read += n;
+ }
+ return read;
+ }
+
+ @VisibleForTesting
+ static void writeFully(WritableByteChannel channel, ByteBuffer buffer)
throws IOException {
+ while (buffer.hasRemaining()) {
+ channel.write(buffer);
+ }
Review comment:
I'm not sure if I can/should make these better.
If the channel does follow Javadoc description and keeps returning 0 without
throwing,
it might have an infinite loop.
That might be channel's problem, but not sure if I have to add something
like hard limit on the number of retry.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 420723)
Time Spent: 1h (was: 50m)
> TFRecordCodec not attempt to fully read/write
> ---------------------------------------------
>
> Key: BEAM-9743
> URL: https://issues.apache.org/jira/browse/BEAM-9743
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Reporter: Kyoungha Min
> Assignee: Kyoungha Min
> Priority: Critical
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Seems like it only happens with Zstd compression (or any other picky input
> stream that refuse to read fully). Zstd seems very picky at giving out data.
> The parts with the issue are
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672]
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699]
>
> And not so problem within the beam application (As all WritableByteChannels
> in beam-java-sdk-core are backed by some OutputStream), but still not
> following the WritableByteChannel specification,
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727]
>
> ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not
> required to read/write fully, and can refuse to read/write time to time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)