[
https://issues.apache.org/jira/browse/BEAM-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300705#comment-16300705
]
ASF GitHub Bot commented on BEAM-3154:
--------------------------------------
rniemo-g opened a new pull request #4312: [BEAM-3154] Support Multiple
KeyRanges when reading from BigTable
URL: https://github.com/apache/beam/pull/4312
Follow this checklist to help us incorporate your contribution quickly and
easily:
- [x] Make sure there is a [JIRA
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the
change (usually before you start working on it). Trivial changes like typos do
not require a JIRA issue. Your pull request should address just this issue,
without pulling in other changes.
- [x] Each commit in the pull request should have a meaningful subject line
and body.
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue.
- [x] Write a pull request description that is detailed enough to
understand what the pull request does, how, and why.
- [x] Run `mvn clean verify` to make sure basic checks pass. A more
thorough check will be performed on your pull request automatically.
- [x] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
---
Support using multiple key ranges when reading from BigTable. This is useful
for applications that want to read non contiguous keys in one BigTable query.
First, this PR tweaks BigTableSource's existing methods that split and
estimate size based on samples to now work with multiple key ranges. These are
relatively simple extensions, applying the previous logic of splitting one key
range to each key range, with some edge cases. E.g. when estimating the size of
the ranges based on samples, even if two ranges overlap a given sample range,
we only want to add the sample range's size to the estimate once.
Then, a range tracker for multiple key ranges is defined as
ByteKeyRangesTracker. This tracker sorts the list of key ranges on
instantiation and operates on them similar to ByteKeyRangeTracker. These
classes operate similar enough that an abstract base class was created to share
functionality. A key difference is the multi-range tracker has a method to
interpolate a key across its multiple ranges. The BigTableReader uses this
multi-range tracker when splitting it's source into primary and residual parts.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Support multiple KeyRanges when reading from BigTable
> -----------------------------------------------------
>
> Key: BEAM-3154
> URL: https://issues.apache.org/jira/browse/BEAM-3154
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-gcp
> Reporter: Ryan Niemocienski
> Assignee: Solomon Duskis
> Priority: Minor
>
> BigTableIO.Read currently only supports reading one KeyRange from BT. It
> would be nice to read multiple ranges from BigTable in one read. Thoughts on
> the feasibility of this before I dig into it?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)