[
https://issues.apache.org/jira/browse/BEAM-3246?focusedWorklogId=81656&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81656
]
ASF GitHub Bot logged work on BEAM-3246:
----------------------------------------
Author: ASF GitHub Bot
Created on: 18/Mar/18 15:24
Start Date: 18/Mar/18 15:24
Worklog Time Spent: 10m
Work Description: sduskis commented on a change in pull request #4517:
[BEAM-3246] Bigtable: Merge splits if they exceed 15K
URL: https://github.com/apache/beam/pull/4517#discussion_r175292694
##########
File path:
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java
##########
@@ -595,6 +596,180 @@ public void testReadingWithSplits() throws Exception {
assertSourcesEqualReferenceSource(source, splits, null /* options */);
}
+ private void assertAllSourcesHaveSingleAdjacentRanges(List<BigtableSource>
sources) {
+ if (sources.size() > 0) {
+ assertThat(sources.get(0).getRanges(), hasSize(1));
+ for (int i = 1; i < sources.size(); i++) {
+ assertThat(sources.get(i).getRanges(), hasSize(1));
+ ByteKey lastEndKey = sources.get(i - 1).getRanges().get(0).getEndKey();
+ ByteKey currentStartKey =
sources.get(i).getRanges().get(0).getStartKey();
+ assertEquals(lastEndKey, currentStartKey);
+ }
+ }
+ }
+
+ private void assertAllSourcesHaveSingleRanges(List<BigtableSource> sources) {
+ for (BigtableSource source : sources) {
+ assertThat(source.getRanges(), hasSize(1));
+ }
+ }
+
+ private ByteKey createByteKey(int key) {
+ return ByteKey.copyFrom(String.format("key%09d", key).getBytes());
+ }
+
+ /** Tests reduce splits with few non adjacent ranges. */
+ @Test
+ public void testReduceSplitsWithSomeNonAdjacentRanges() throws Exception {
+ final String table = "TEST-MANY-ROWS-SPLITS-TABLE";
+ final int numRows = 10;
+ final int numSamples = 10;
+ final long bytesPerRow = 100L;
+ final int maxSplit = 3;
+
+ // Set up test table data and sample row keys for size estimation and
splitting.
+ makeTableData(table, numRows);
Review comment:
I don't see any uses of the following lines:
```
makeTableData(table, numRows);
service.setupSampleRowKeys(table, numSamples, bytesPerRow);
```
Please remove the lines if they are no longer used, and any unused
variables. If there is a usage, can you please explain where it is?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 81656)
Time Spent: 4h 20m (was: 4h 10m)
> BigtableIO should merge splits if they exceed 15K
> -------------------------------------------------
>
> Key: BEAM-3246
> URL: https://issues.apache.org/jira/browse/BEAM-3246
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Reporter: Solomon Duskis
> Assignee: Solomon Duskis
> Priority: Major
> Time Spent: 4h 20m
> Remaining Estimate: 0h
>
> A customer hit a problem with a large number of splits. CloudBitableIO fixes
> that here
> https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-hbase-beam/src/main/java/com/google/cloud/bigtable/beam/CloudBigtableIO.java#L241
> BigtableIO should have similar logic.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)