[
https://issues.apache.org/jira/browse/BEAM-5412?focusedWorklogId=145875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145875
]
ASF GitHub Bot logged work on BEAM-5412:
----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Sep/18 00:06
Start Date: 20/Sep/18 00:06
Worklog Time Spent: 10m
Work Description: rangadi commented on a change in pull request #6440:
[BEAM-5412][BEAM-5408] Fixes a bug that limited the size of TFRecords
URL: https://github.com/apache/beam/pull/6440#discussion_r218999001
##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java
##########
@@ -625,7 +625,19 @@ public int recordLength(byte[] data) {
checkState(hashLong(length) == maskedCrc32OfLength, "Mismatch of length
mask");
ByteBuffer data = ByteBuffer.allocate((int) length);
- checkState(inChannel.read(data) == length, "Invalid data");
+ long totalRead = 0;
+ while (true) {
+ long read = inChannel.read(data);
+ if (read == 0) {
Review comment:
Did you mean `read < 0`? Otherwise this will spin in infinite loop when it
encounters EOF.
Also, what should we do when read returns 0? It think the correct thing to
do is to continue to read (unless there some expectation that that underlying
channel should never return zero). Most blocking channels don't return zero,
they just wait till EOF or data arrives. Better to be explicit here about the
policy.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 145875)
Time Spent: 0.5h (was: 20m)
> TFRecordIO fails with records larger than 8K
> --------------------------------------------
>
> Key: BEAM-5412
> URL: https://issues.apache.org/jira/browse/BEAM-5412
> Project: Beam
> Issue Type: Bug
> Components: io-java-text
> Affects Versions: 2.4.0
> Reporter: Raghu Angadi
> Assignee: Chamikara Jayalath
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> This was reported on
> [Stackoverflow|https://stackoverflow.com/questions/52284639/beam-java-sdk-with-tfrecord-and-compression-gzip].
> TFRecordIO reader assumes a single call to {{channel.read()}} returns as
> much as can fit in the input buffer. {{read()}} can return fewer bytes than
> requested. Assert failure :
> https://github.com/apache/beam/blob/release-2.4.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L642
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)