ctubbsii commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1126585073


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", 
"1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", 
PropertyType.BOOLEAN,
+      "Skip Trash for Import files when the trash is in use.", "3.0.0"),

Review Comment:
   Having a per-table property might be useful, but the unintuitiveness of two 
separate properties to control three modes of operation would still be an issue 
that needs addressing. The per-table option wasn't what was being proposed, and 
it didn't occur to me. I can see the merits of it, but it's not so simple to 
implement, since the accumulo-gc operates asynchronously, and doesn't know the 
table from which a file came... it can only make educated guesses based on the 
file's path, but that's not reliable, since it could have been used by multiple 
tables, or manually placed through some import process that didn't include a 
table id in the path... which is only done by convention, not a guarantee.
   
   My view that Accumulo is managing the trash is based on our long-held desire 
to try to decouple ourselves from HDFS, and I'm recognizing the trash as an 
HDFS-specific feature. The more that filesystem-related behavior is handled 
within the filesystem config, and Accumulo-related behavior is handled within 
Accumulo's config, the better, I think. While this may be less convenient 
initially, I think it makes greater sense as the overall system architecture 
grows in complexity. Replication (number of block replicas) and WALOG 
durability (hsync, hflush) are also filesystem-specific behaviors, but those 
knobs already exist. I'm more inclined to scrutinize the knobs that further 
tightly couple us to HDFS when they are new, so we're not digging ourselves a 
deeper hole that is harder to get out of (to decouple). But this is just a 
heuristic I apply to help me think about the consequences of a change... it's 
not a fixed prerequisite for me to support a change or anything like that.
   
   I was also thinking about an "on-delete" hook or similar... but I don't 
think I'd be in favor of that, because I think if a user wanted that, they 
could just subclass an existing FileSystem and add that feature to its delete 
method. If every user did that, it would be a hassle... but making a new 
FineGrainedTrashPolicyFileSystem only needs to be done once. I'm not suggesting 
this is the best option... just one possible alternative solution to the 
property changes being proposed here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to