[ 
https://issues.apache.org/jira/browse/BEAM-3246?focusedWorklogId=81651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81651
 ]

ASF GitHub Bot logged work on BEAM-3246:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Mar/18 12:33
            Start Date: 18/Mar/18 12:33
    Worklog Time Spent: 10m 
      Work Description: sduskis commented on a change in pull request #4517: 
[BEAM-3246] Bigtable: Merge splits if they exceed 15K
URL: https://github.com/apache/beam/pull/4517#discussion_r175287451
 
 

 ##########
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java
 ##########
 @@ -595,6 +596,183 @@ public void testReadingWithSplits() throws Exception {
     assertSourcesEqualReferenceSource(source, splits, null /* options */);
   }
 
+  private void assertAllSourcesHaveSingleAdjacentRanges(List<BigtableSource> 
sources) {
+    if (sources.size() > 0) {
+      assertThat(sources.get(0).getRanges(), hasSize(1));
+      for (int i = 1; i < sources.size(); i++) {
+        assertThat(sources.get(i).getRanges(), hasSize(1));
+        ByteKey lastEndKey = sources.get(i - 1).getRanges().get(0).getEndKey();
+        ByteKey currentStartKey = 
sources.get(i).getRanges().get(0).getStartKey();
+        assertEquals(lastEndKey, currentStartKey);
+      }
+    }
+  }
+
+  private void assertAllSourcesHaveSingleRanges(List<BigtableSource> sources) {
+    for (BigtableSource source : sources) {
+      assertThat(source.getRanges(), hasSize(1));
+    }
+  }
+
+  private ByteKey createByteKey(int key) {
+    return ByteKey.copyFrom(String.format("key%09d", key).getBytes());
+  }
+
+  /** Tests reduce splits with few non adjacent ranges. */
+  @Test
+  public void testReduceSplitsWithSomeNonAdjacentRanges() throws Exception {
+    final String table = "TEST-MANY-ROWS-SPLITS-TABLE";
+    final int numRows = 10;
+    final int numSamples = 10;
+    final long bytesPerRow = 100L;
+    final int maxSplit = 3;
+
+    // Set up test table data and sample row keys for size estimation and 
splitting.
+    makeTableData(table, numRows);
+    service.setupSampleRowKeys(table, numSamples, bytesPerRow);
+
+    ByteKeyRange tableRange = service.getTableRange(table);
+    //Construct few non contiguous key ranges 
[..1][1..2][3..4][4..5][6..7][8..]
+    List<ByteKeyRange> keyRanges = Arrays.asList(
+        tableRange.withEndKey(createByteKey(1)),
 
 Review comment:
   Please use `ByteKeyRange.of(start, end)` instead of tableRage.withEndKey().  
there's no need to involve `tableRange` here.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 81651)
    Time Spent: 4h  (was: 3h 50m)

> BigtableIO should merge splits if they exceed 15K
> -------------------------------------------------
>
>                 Key: BEAM-3246
>                 URL: https://issues.apache.org/jira/browse/BEAM-3246
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: Solomon Duskis
>            Assignee: Solomon Duskis
>            Priority: Major
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> A customer hit a problem with a large number of splits.  CloudBitableIO fixes 
> that here 
> https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-hbase-beam/src/main/java/com/google/cloud/bigtable/beam/CloudBigtableIO.java#L241
> BigtableIO should have similar logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to