[ 
https://issues.apache.org/jira/browse/BEAM-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300705#comment-16300705
 ] 

ASF GitHub Bot commented on BEAM-3154:
--------------------------------------

rniemo-g opened a new pull request #4312: [BEAM-3154] Support Multiple 
KeyRanges when reading from BigTable
URL: https://github.com/apache/beam/pull/4312
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
    - [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
    - [x] Each commit in the pull request should have a meaningful subject line 
and body.
    - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
    - [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
    - [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
    - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   
   Support using multiple key ranges when reading from BigTable. This is useful 
for applications that want to read non contiguous keys in one BigTable query.
   
   
   First, this PR tweaks BigTableSource's existing methods that split and 
estimate size based on samples to now work with multiple key ranges. These are 
relatively simple extensions, applying the previous logic of splitting one key 
range to each key range, with some edge cases. E.g. when estimating the size of 
the ranges based on samples, even if two ranges overlap a given sample range, 
we only want to add the sample range's size to the estimate once.
   
   Then, a range tracker for multiple key ranges is defined as 
ByteKeyRangesTracker. This tracker sorts the list of key ranges on 
instantiation and operates on them similar to ByteKeyRangeTracker. These 
classes operate similar enough that an abstract base class was created to share 
functionality. A key difference is the multi-range tracker has a method to 
interpolate a key across its multiple ranges. The BigTableReader uses this 
multi-range tracker when splitting it's source into primary and residual parts.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support multiple KeyRanges when reading from BigTable
> -----------------------------------------------------
>
>                 Key: BEAM-3154
>                 URL: https://issues.apache.org/jira/browse/BEAM-3154
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-gcp
>            Reporter: Ryan Niemocienski
>            Assignee: Solomon Duskis
>            Priority: Minor
>
> BigTableIO.Read currently only supports reading one KeyRange from BT. It 
> would be nice to read multiple ranges from BigTable in one read. Thoughts on 
> the feasibility of this before I dig into it?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to