I would like to improve the archive tool [see issue 3307].
----------------------------------------------------------
Key: HADOOP-3929
URL: https://issues.apache.org/jira/browse/HADOOP-3929
Project: Hadoop Core
Issue Type: Improvement
Reporter: Dick King
I have a tool written atop the libhdfs library that implements an archive
system. It's working [in C++]
JIRA #3307 documents a native DFS archive system, first available in 18.0 .
I would like to port my code, and thereby extend that system in 3 directions:
1: archives will be immutable in 18.0 . I would like to provide an API to let
you add, delete, and modify files.
1a: You would want to be able to batch such operations and perform them all
at once when a batch is complete.
2: the tree to be archived must be in dfs in 18.0 . I would like it to be
possible for the tree to contain some local filesystem files as well [think
org.apache.hadoop.fs.Path ]
2a: I realize that this would preclude parallel modification when a local
filesystem is used
2b: I don't have a convincing story re two processes simultaneously
modifying the same archive, even for a disjoint set of files, but I'm willing
to discuss this.
3: i would like it to be possible to batch the changes and make them all in one
operation, to reduce DFS activity.
I had in-person discussions on this with user mahadev . He is encouraging me
to file this bug report so we can broaden this discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.