[
https://issues.apache.org/jira/browse/BEAM-5412?focusedWorklogId=145903&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145903
]
ASF GitHub Bot logged work on BEAM-5412:
----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Sep/18 01:46
Start Date: 20/Sep/18 01:46
Worklog Time Spent: 10m
Work Description: chamikaramj commented on a change in pull request
#6440: [BEAM-5412][BEAM-5408] Fixes a bug that limited the size of TFRecords
URL: https://github.com/apache/beam/pull/6440#discussion_r219006087
##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java
##########
@@ -625,7 +625,19 @@ public int recordLength(byte[] data) {
checkState(hashLong(length) == maskedCrc32OfLength, "Mismatch of length
mask");
ByteBuffer data = ByteBuffer.allocate((int) length);
- checkState(inChannel.read(data) == length, "Invalid data");
+ long totalRead = 0;
+ while (true) {
+ long read = inChannel.read(data);
+ if (read == 0) {
Review comment:
No need for this check anymore due to reading within the loop condition.
https://docs.oracle.com/javase/7/docs/api/java/nio/channels/ReadableByteChannel.html
guarantees that read() call will be blocked till at least one byte is read.
But I guess there's no harm in using ">= 0" in the condition just in case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 145903)
Time Spent: 2h (was: 1h 50m)
> TFRecordIO fails with records larger than 8K
> --------------------------------------------
>
> Key: BEAM-5412
> URL: https://issues.apache.org/jira/browse/BEAM-5412
> Project: Beam
> Issue Type: Bug
> Components: io-java-text
> Affects Versions: 2.4.0
> Reporter: Raghu Angadi
> Assignee: Chamikara Jayalath
> Priority: Major
> Time Spent: 2h
> Remaining Estimate: 0h
>
> This was reported on
> [Stackoverflow|https://stackoverflow.com/questions/52284639/beam-java-sdk-with-tfrecord-and-compression-gzip].
> TFRecordIO reader assumes a single call to {{channel.read()}} returns as
> much as can fit in the input buffer. {{read()}} can return fewer bytes than
> requested. Assert failure :
> https://github.com/apache/beam/blob/release-2.4.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L642
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)