wchevreuil commented on PR #6661:
URL: https://github.com/apache/hbase/pull/6661#issuecomment-2772967762

   > > I have a few points on this:
   > > 
   > > 1. Class names should follow @anmolnar suggestions;
   > > 2. Doing bulkload at the mapper writer would be extremely costly for 
medium to large bulkoads. I wonder if that could cause wal player jobs to time 
out and retry, which would be a disaster. Rather then embedding this in the wal 
player, I would rather do it as a separate, independent tool, that just "scan" 
the wals searching for bulkload markers, then maybe join all related files and 
trigger a single, bigger bulkload operation?
   > 
   > 1. Addressed
   > 2. IMHO Restoring bulkloads along with Put/Delete mutations is essential 
to maintaining the original order of WAL entries. If an entry in a bulkloaded 
HFile is later modified or deleted, the restore process must follow the same 
sequence—first applying the bulkload, then executing Put/Delete mutations in 
their original order. However, a potential issue with this approach is that 
bulkload operations take time to complete, and during this period, incoming 
Put/Delete mutations might be ignored if the corresponding HFiles have not yet 
been fully loaded. @anmolnar @vinayakphegde @wchevreuil thoughts please
   
   This is the same as currently happens for replication. Because we rely on 
cell timestamps, it's only a problem for DELETE operations, if a major 
compaction runs between a DELETE was applied and the bulkload completed. That's 
mitigated by the enabling of the KEEP_DELETED_CELLS flag. 
   
   IMO, bulkload should be done independently of the normal wal replay. Maybe 
also be made optional in PITR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to