bbeaudreault commented on code in PR #4770:
URL: https://github.com/apache/hbase/pull/4770#discussion_r1060973517


##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java:
##########
@@ -254,16 +255,19 @@ public void write(ImmutableBytesWritable row, V cell) 
throws IOException {
         byte[] tableNameBytes = null;
         if (writeMultipleTables) {
           tableNameBytes = MultiTableHFileOutputFormat.getTableName(row.get());
-          tableNameBytes = TableName.valueOf(tableNameBytes).toBytes();
+          tableNameBytes = 
TableName.valueOf(tableNameBytes).getNameWithNamespaceInclAsString()

Review Comment:
   I've been looking at this. I think the assumptions in HBASE-20530 are 
incorrect. The multi-table support in HFileOutputFormat2 was introduced 
separately from backups and already exists in branch-2. As such we can't break 
them like this. I think changing this path would fall under "File format 
compatibility" in our [compatibility 
rules](https://hbase.apache.org/book.html#hbase.versioning.post10) which 
requires a major version to change.
   
   It seems like the problem in HBASE-20530 was due to a mismatch between the 
format returned by `HBackupFileSystem#getTableBackupDir` and this code here. 
The decision in that jira was to change this code here, rather than 
getTableBackupDir. I think that was a reasonable decision for two reasons:
   1. it was a major version upgrade at the time.
   2. this code here in HFileOutputFormat2 is susceptible to a bug where 
multiple tables with the same name but in separate namespaces might collide.
   
   That bug seems problematic, but no one has reported it. We have two options:
   
   1. Leave this code alone here, and instead change 
`HBackupFileSystem#getTableBackupDir`.
   2. Add a config in HFileOutputFormat2 like 
`hbase.mapreduce.hfileoutputformat.multi.table.include.namespace`. We can have 
B&R always set this to true, and the config would be removed as part of 3.x 
upgrade. We'd use this to gate the logic above.
   
   I don't love adding more config params, but I might suggest choosing option 
2. This way if anyone stumbles across this bug in their own use-case, they 
actually have a way to get around it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to