ctubbsii commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1126585073
##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
"The number of threads used to delete RFiles and write-ahead logs",
"1.3.5"),
GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
"Do not use the Trash, even if it is configured.", "1.5.0"),
+ GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false",
PropertyType.BOOLEAN,
+ "Skip Trash for Import files when the trash is in use.", "3.0.0"),
Review Comment:
Having a per-table property might be useful, but the unintuitiveness of two
separate properties to control three modes of operation would still be an issue
that needs addressing. The per-table option wasn't what was being proposed, and
it didn't occur to me. I can see the merits of it, but it's not so simple to
implement, since the accumulo-gc operates asynchronously, and doesn't know the
table from which a file came... it can only make educated guesses based on the
file's path, but that's not reliable, since it could have been used by multiple
tables, or manually placed through some import process that didn't include a
table id in the path... which is only done by convention, not a guarantee.
My view that Accumulo is managing the trash is based on our long-held desire
to try to decouple ourselves from HDFS, and I'm recognizing the trash as an
HDFS-specific feature. The more that filesystem-related behavior is handled
within the filesystem config, and Accumulo-related behavior is handled within
Accumulo's config, the better, I think. While this may be less convenient
initially, I think it makes greater sense as the overall system architecture
grows in complexity. Replication (number of block replicas) and WALOG
durability (hsync, hflush) are also filesystem-specific behaviors, but those
knobs already exist. I'm more inclined to scrutinize the knobs that further
tightly couple us to HDFS when they are new, so we're not digging ourselves a
deeper hole that is harder to get out of (to decouple). But this is just a
heuristic I apply to help me think about the consequences of a change... it's
not a fixed prerequisite for me to support a change or anything like that.
I was also thinking about an "on-delete" hook or similar... but I don't
think I'd be in favor of that, because I think if a user wanted that, they
could just subclass an existing FileSystem and add that feature to its delete
method. If every user did that, it would be a hassle... but making a new
FineGrainedTrashPolicyFileSystem only needs to be done once. I'm not suggesting
this is the best option... just one possible alternative solution to the
property changes being proposed here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]