Re: [PR] HBASE-28456 HBase Restore restores old data if data for the same timestamp is in different hfiles [hbase]

via GitHub Tue, 26 Mar 2024 05:55:42 -0700


bbeaudreault commented on code in PR #5775:
URL: https://github.com/apache/hbase/pull/5775#discussion_r1539170198



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileInfo.java:
##########
@@ -423,6 +424,54 @@ public String toString() {
       + (isReference() ? "->" + getReferredToFile(this.getPath()) + "-" + 
reference : "");
   }
 
+  /**
+   * Cells in a bulkloaded file don't have a sequenceId since they don't go 
through memstore. When a
+   * bulkload file is committed, the current memstore ts is stamped onto the 
file name as the

Review Comment:
   I think the reason why we don't do this already is that a bulkloaded HFile 
is typically created outside the cluster. At that point, we don't know what the 
memstore seq id is at. Only at the time of bulkload file commit, when we've 
locked and flushed the memstore, do we know the memstore seq id.
   
   Since hfiles are immutable, we can't open it at that point and change a 
metadata. So we need to add it to the filename I guess. I agree relying on the 
file name seems brittle. A more robust solution might be tricky given the above 
tho



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28456 HBase Restore restores old data if data for the same timestamp is in different hfiles [hbase]

Reply via email to