[GitHub] [hbase] bharathv commented on a change in pull request #2591: HBASE-24859: Optimize in-memory representation of HBase map reduce table splits

GitBox Thu, 29 Oct 2020 10:12:26 -0700


bharathv commented on a change in pull request #2591:
URL: https://github.com/apache/hbase/pull/2591#discussion_r514426736




##########
File path: 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
##########
@@ -271,6 +272,11 @@ public void testNumOfSplits(int splitsPerRegion, int 
expectedNumOfSplits) throws
     TableInputFormat tif = new TableInputFormat();
     tif.setConf(job.getConfiguration());
     List<InputSplit> splits = tif.getSplits(job);
+    for (InputSplit split : splits) {
+      TableSplit tableSplit = (TableSplit) split;
+      Assert.assertEquals(tableSplit.getScan().getStartRow(), 
HConstants.EMPTY_START_ROW);

Review comment:
       Ya my point is, that is a weak check. You wouldn't know if that scan is 
an empty scan or a default scan. Change the code to something like this and you 
can see there is a bug ...
   
    ```
   for (InputSplit split : splits) {
         TableSplit tableSplit = (TableSplit) split;
         assertTrue(tableSplit.getScanAsString().isEmpty());
     }
   
   In table split code, add this...
   
    public String getScanAsString() {
       return scan;
     }
   ```
   
   Run this test
   
   ```
     @Test
     public void testGetSplits() throws IOException, InterruptedException, 
ClassNotFoundException {
       testNumOfSplits(1, 26);
       testNumOfSplits(3, 78);  <====
     }
   ```
   
   If number of splits per region is > 1, your patch didn't handle the 
TableSplit constructor in this call createNInputSplitsUniform(). Your test 
still passes because you are asserting on a default Scan object and not the 
fact that it shouldn't be set in the first place.

##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
##########
@@ -184,9 +184,24 @@ public TableSplit(final byte [] tableName, byte[] 
startRow, byte[] endRow,
    * @param startRow The start row of the split.
    * @param endRow The end row of the split.
    * @param location The location of the region.
+   * @param encodedRegionName The region ID.
+   * @param length Size of region in bytes
    */
   public TableSplit(TableName tableName, byte[] startRow, byte[] endRow,
-      final String location) {
+    final String location, final String encodedRegionName, long length) {
+    this(tableName, null, startRow, endRow, location, encodedRegionName, 
length);

Review comment:
       > I think the separate constructor is more intuitive that it is okay to 
not have scan object? What do you think?
   
   IMHO more constructors is less readability. In this case we have 8 
constructors for TableSplit. If I were to use it, I'd confused unless I see the 
usages of each of these. Either we should have a [fluent 
interface](https://en.wikipedia.org/wiki/Fluent_interface) or fewer 
constructors is what I think. Subjective of course.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] bharathv commented on a change in pull request #2591: HBASE-24859: Optimize in-memory representation of HBase map reduce table splits

Reply via email to