bharathv commented on a change in pull request #2591:
URL: https://github.com/apache/hbase/pull/2591#discussion_r514426736
##########
File path:
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
##########
@@ -271,6 +272,11 @@ public void testNumOfSplits(int splitsPerRegion, int
expectedNumOfSplits) throws
TableInputFormat tif = new TableInputFormat();
tif.setConf(job.getConfiguration());
List<InputSplit> splits = tif.getSplits(job);
+ for (InputSplit split : splits) {
+ TableSplit tableSplit = (TableSplit) split;
+ Assert.assertEquals(tableSplit.getScan().getStartRow(),
HConstants.EMPTY_START_ROW);
Review comment:
Ya my point is, that is a weak check. You wouldn't know if that scan is
an empty scan or a default scan. Change the code to something like this and you
can see there is a bug ...
```
for (InputSplit split : splits) {
TableSplit tableSplit = (TableSplit) split;
assertTrue(tableSplit.getScanAsString().isEmpty());
}
In table split code, add this...
public String getScanAsString() {
return scan;
}
```
Run this test
```
@Test
public void testGetSplits() throws IOException, InterruptedException,
ClassNotFoundException {
testNumOfSplits(1, 26);
testNumOfSplits(3, 78); <====
}
```
If number of splits per region is > 1, your patch didn't handle the
TableSplit constructor in this call createNInputSplitsUniform(). Your test
still passes because you are asserting on a default Scan object and not the
fact that it shouldn't be set in the first place.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
##########
@@ -184,9 +184,24 @@ public TableSplit(final byte [] tableName, byte[]
startRow, byte[] endRow,
* @param startRow The start row of the split.
* @param endRow The end row of the split.
* @param location The location of the region.
+ * @param encodedRegionName The region ID.
+ * @param length Size of region in bytes
*/
public TableSplit(TableName tableName, byte[] startRow, byte[] endRow,
- final String location) {
+ final String location, final String encodedRegionName, long length) {
+ this(tableName, null, startRow, endRow, location, encodedRegionName,
length);
Review comment:
> I think the separate constructor is more intuitive that it is okay to
not have scan object? What do you think?
IMHO more constructors is less readability. In this case we have 8
constructors for TableSplit. If I were to use it, I'd confused unless I see the
usages of each of these. Either we should have a [fluent
interface](https://en.wikipedia.org/wiki/Fluent_interface) or fewer
constructors is what I think. Subjective of course.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]