[
https://issues.apache.org/jira/browse/BEAM-8871?focusedWorklogId=426733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-426733
]
ASF GitHub Bot logged work on BEAM-8871:
----------------------------------------
Author: ASF GitHub Bot
Created on: 23/Apr/20 20:38
Start Date: 23/Apr/20 20:38
Worklog Time Spent: 10m
Work Description: lukecwik edited a comment on pull request #11454:
URL: https://github.com/apache/beam/pull/11454#issuecomment-618654517
> (Sorry for adding this but cannot find a better moment ot ask). Curious
question related to fractional splits (but not to this PR). Would testing
trySplit for a RestrictionTracker would be enough to test that a SDF based IO
that uses this kind of Restriction supports DWR correctly?
>
> I am asking because in HBaseIO based on SDF it is one of the missing tests
compared with the tests we had in the past for `splitAtFranction`
>
>
https://github.com/apache/beam/blob/ec67a9374671ea9ae670fb0f3935ead2ebed7981/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java#L359-L384
>
> If it is not the case, any suggestions on how to test this?
Testing the restriction tracker is almost all that is needed for the most
part for testing DWR.
The last bit would be to make sure that the DoFn doesn't assume its
processing the initial restriction and interacts with the restriction tracker
correctly. An example of a splittable DoFn that would work a lot of the time
but be incorrect when splitting occurs would be something like:
```
@ProcessElement
public void process(RestrictionTracker<OffsetRange, Long> foo) {
int start = foo.getCurrentRestriction().getFrom();
int end = foo.getCurrentRestriction().getTo();
from (int i = start; i < end; ++i) {
... produce output ...
}
tracker.tryClaim(end);
}
```
All the examples I know of stem from producing output before claiming. Also,
moving the tryClaim before the loop would make the splittable DoFn correct but
would prevent it from being able to effectively split.
Testing that output is only produced after the DoFn has done the claim call
makes a lot of sense and ensuring that only the part that has been claimed was
produced as output would cover correctness.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 426733)
Time Spent: 4h (was: 3h 50m)
> Add support for splitting at fractions > 0 to
> org.apache.beam.sdk.transforms.splittabledofn.ByteKeyRangeTracker
> ---------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-8871
> URL: https://issues.apache.org/jira/browse/BEAM-8871
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-java-core
> Reporter: Luke Cwik
> Assignee: Boyuan Zhang
> Priority: Major
> Time Spent: 4h
> Remaining Estimate: 0h
>
> org.apache.beam.sdk.transforms.splittabledofn.ByteKeyRangeTracker only
> supports checkpointing
--
This message was sent by Atlassian Jira
(v8.3.4#803005)