cshannon commented on issue #608:
URL: https://github.com/apache/accumulo/issues/608#issuecomment-1418236284

   I did some more testing this past Friday/Saturday on this and couldn't get 
an errors to show up still. I did however see in the logs the iterator actually 
detect inconsistencies in the meatadata table and it fixed itself (just like I 
said in my previous comment). I talked to @keith-turner about this a bit and he 
suggested I try testing against 1.10 (and not just 2.1 and main). I also 
realized I was mostly testing using the mini accumulo cluster which does not 
use HDFS. So today I spent some time-rerunning my tests against 1.10 and using 
Uno (so it's a real cluster with HDFS) and I still didn't get an errors and 
everything worked as it should.
   
   One idea I had that could possibly help the problem would be to require 
garbage collection to run more than one time and compare results before 
actually removing files. For example, GC could run the scan to get the file 
candidates with references multiple times in the same GC run and then compare 
results and take a superset or fail if inconsistent.  Another option is to 
require more than one GC run before actually deleting. Something like when GC 
runs and comes comes up with the files it is about to delete maybe we just mark 
them (in metadata probably) as deleted/to be deleted and a subsequent run could 
do the actual delete if it detects the file as marked in a previous run so we 
know we at least had multiple runs where we think we should delete. We could 
even make it configurable to require X number of positive hits to do a deletion 
or have a time delay etc. 
   
   Running the scan more than once to detect the file references would only 
actually help if the problem was transient and non-deterministic and was 
isolated to a single scan and wouldn't just happen in a future scan which is 
hard to say because we don't know the actual problem. There's also a bit of a 
chicken/egg problem where if we are writing the GC result back to metadata but 
metadata scans are the issue and inconsistent it could be a problem. Also this 
behavior to delay deletion can sort of already be accomplished by using HDFS 
trash and files can be recovered from there and there is already #3140 to help 
make the HDFS trash option better but having it built in to GC could also be 
nice if it would help prevent early deletion in the first place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to