GitHub user dhalperi opened a pull request:

    https://github.com/apache/incubator-beam/pull/353

    Add BoundedReader APIs for expressing remaining and consumed parallelism

    These are useful for dynamic work rebalancing and autoscaling.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhalperi/incubator-beam limited-parallelism

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/353.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #353
    
----
commit dfeecdbd6a751f0bac1f398dd1a86040a5c5166e
Author: Dan Halperin <[email protected]>
Date:   2016-05-04T00:53:48Z

    BoundedReader: add getParallelism{Consumed,Remaining}
    
    And implement it for common sources

commit 837a42dcce1d14a45ff7c8b7c3b4efdbbf98ef82
Author: Dan Halperin <[email protected]>
Date:   2016-05-15T20:54:57Z

    OffsetBasedReader: test limited parallelism signals

commit 32894f8e4f7b68b940e71dcc52202bf2216914aa
Author: Dan Halperin <[email protected]>
Date:   2016-05-16T22:33:11Z

    CompressedSource: add tests of parallelism and progress
    
    *) empty file
    *) non-empty compressed file
    *) non-empty not-compressed file

commit ca29728dbff52a796279753c5b3efc3659f9ba06
Author: Dan Halperin <[email protected]>
Date:   2016-05-17T03:04:52Z

    TextIO: implement and test parallelism
    
    *) empty file
    *) non-empty file

commit b866541f71df59f278e17b2895e7b412cbc7734f
Author: Dan Halperin <[email protected]>
Date:   2016-05-17T03:48:33Z

    CountingSource: test limited parallelism

commit 1a4a0d99049871fe58f5194c59e0bf646894fae7
Author: Dan Halperin <[email protected]>
Date:   2016-05-17T05:33:07Z

    AvroSource: rewrite to support remaining parallelism
    
    *) Make the start of a block match Avro's definition: the first byte after 
the previous sync marker.
       This enables detecting the last block in the file.
    *) This change enables us to unify currentOffset and currentBlockOffset, as 
all records are emitted
       at the start of the block that contains them.
    *) Simplify block header reading to have fewer object allocations and 
buffers using a direct
       reader and a (allocated once only) CountingInputStream to measure the 
size of that header.
    *) Add tests for consumed and remaining parallelism
    *) Let BlockBasedSource detect the end of the file in remaining parallelism.
    *) Verify in more places that the correct number of bytes is read from
       the input Avro file.

commit 4c775be82ccf68cdc221242e9cdfd4d0796a13e7
Author: Dan Halperin <[email protected]>
Date:   2016-05-17T08:47:54Z

    CompressedSource: implement currentOffset based on bytes decompressed
    
    This is not a very good offset because it is an upper bound, but it is
    likely better than not reporting any progress at all.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to