[ 
https://issues.apache.org/jira/browse/BEAM-3246?focusedWorklogId=81669&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81669
 ]

ASF GitHub Bot logged work on BEAM-3246:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Mar/18 17:18
            Start Date: 18/Mar/18 17:18
    Worklog Time Spent: 10m 
      Work Description: arkash commented on a change in pull request #4517: 
[BEAM-3246] Bigtable: Merge splits if they exceed 15K
URL: https://github.com/apache/beam/pull/4517#discussion_r175296227
 
 

 ##########
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java
 ##########
 @@ -595,6 +596,180 @@ public void testReadingWithSplits() throws Exception {
     assertSourcesEqualReferenceSource(source, splits, null /* options */);
   }
 
+  private void assertAllSourcesHaveSingleAdjacentRanges(List<BigtableSource> 
sources) {
+    if (sources.size() > 0) {
+      assertThat(sources.get(0).getRanges(), hasSize(1));
+      for (int i = 1; i < sources.size(); i++) {
+        assertThat(sources.get(i).getRanges(), hasSize(1));
+        ByteKey lastEndKey = sources.get(i - 1).getRanges().get(0).getEndKey();
+        ByteKey currentStartKey = 
sources.get(i).getRanges().get(0).getStartKey();
+        assertEquals(lastEndKey, currentStartKey);
+      }
+    }
+  }
+
+  private void assertAllSourcesHaveSingleRanges(List<BigtableSource> sources) {
+    for (BigtableSource source : sources) {
+      assertThat(source.getRanges(), hasSize(1));
+    }
+  }
+
+  private ByteKey createByteKey(int key) {
+    return ByteKey.copyFrom(String.format("key%09d", key).getBytes());
+  }
+
+  /** Tests reduce splits with few non adjacent ranges. */
+  @Test
+  public void testReduceSplitsWithSomeNonAdjacentRanges() throws Exception {
+    final String table = "TEST-MANY-ROWS-SPLITS-TABLE";
+    final int numRows = 10;
+    final int numSamples = 10;
+    final long bytesPerRow = 100L;
+    final int maxSplit = 3;
+
+    // Set up test table data and sample row keys for size estimation and 
splitting.
+    makeTableData(table, numRows);
 
 Review comment:
   Below method internally called by split in BigtableSource
   
       private List<BigtableSource> splitBasedOnSamples(
           long desiredBundleSizeBytes, List<SampleRowKeysResponse> 
sampleRowKeys) {
         // There are no regions, or no samples available. Just scan the entire 
range.
         if (sampleRowKeys.isEmpty()) {
           LOG.info("Not splitting source {} because no sample row keys are 
available.", this);
           return Collections.singletonList(this);
         }
         LOG.info(
             "About to split into bundles of size {} with sampleRowKeys length 
{} first element {}",
             desiredBundleSizeBytes,
             sampleRowKeys.size(),
             sampleRowKeys.get(0));
         ImmutableList.Builder<BigtableSource> splits = ImmutableList.builder();
         for (ByteKeyRange range : ranges) {
           splits.addAll(splitRangeBasedOnSamples(desiredBundleSizeBytes, 
sampleRowKeys, range));
         }
         return splits.build();
       }

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 81669)
    Time Spent: 4h 40m  (was: 4.5h)

> BigtableIO should merge splits if they exceed 15K
> -------------------------------------------------
>
>                 Key: BEAM-3246
>                 URL: https://issues.apache.org/jira/browse/BEAM-3246
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: Solomon Duskis
>            Assignee: Solomon Duskis
>            Priority: Major
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> A customer hit a problem with a large number of splits.  CloudBitableIO fixes 
> that here 
> https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-hbase-beam/src/main/java/com/google/cloud/bigtable/beam/CloudBigtableIO.java#L241
> BigtableIO should have similar logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to