[ https://issues.apache.org/jira/browse/BEAM-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609942#comment-15609942 ]
ASF GitHub Bot commented on BEAM-840: ------------------------------------- GitHub user mizitch opened a pull request: https://github.com/apache/incubator-beam/pull/1199 [BEAM-840] Add Java SDK extension to support non-distributed sorting Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-<Jira issue #>] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `<Jira issue #>` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). (covered by Google's existing agreement with Apache) --- Add an extension that provides a PTransform which performs local(non-distributed) sorting. It will sort in memory until the buffer is full, then flush to disk and use external sorting. Consumes a PCollection of KVs from primary key to iterable of secondary key and value KVs and sorts the iterables. Would probably be called after a GroupByKey. Uses coders to convert secondary keys and values into byte arrays and does a lexicographical comparison on the secondary keys. Uses Hadoop as an external sorting library. Hi @dhalperi can you please take a look? https://issues.apache.org/jira/browse/BEAM-840 You can merge this pull request into a Git repository by running: $ git pull https://github.com/mizitch/incubator-beam sorter-extension Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1199.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1199 ---- commit e18fd4080b9331953c6cd8ba1fa509cfe56c787b Author: Mitch Shanklin <mshank...@google.com> Date: 2016-10-25T23:17:01Z Add an extension that provides a PTransform which performs local(non-distributed) sorting. It will sort in memory until the buffer is full, then flush to disk and use external sorting. Consumes a PCollection of KVs from primary key to iterable of secondary key and value KVs and sorts the iterables. Would probably be called after a GroupByKey. Uses coders to convert secondary keys and values into byte arrays and does a lexicographical comparison on the secondary keys. Uses Hadoop as an external sorting library. ---- > Add Java SDK extension to support non-distributed sorting > --------------------------------------------------------- > > Key: BEAM-840 > URL: https://issues.apache.org/jira/browse/BEAM-840 > Project: Beam > Issue Type: New Feature > Components: sdk-java-extensions > Affects Versions: 0.4.0-incubating > Reporter: Mitch Shanklin > Assignee: Mitch Shanklin > Priority: Minor > > Add an extension that provides a PTransform which performs > local(non-distributed) sorting. It will sort in memory until the buffer is > full, then flush to disk and use external sorting. > > Consumes a PCollection of KVs from primary key to iterable of secondary key > and value KVs and sorts the iterables. Would probably be called after a > GroupByKey. Uses coders to convert secondary keys and values into byte arrays > and does a lexicographical comparison on the secondary keys. > Uses Hadoop as an external sorting library. -- This message was sent by Atlassian JIRA (v6.3.4#6332)