I would like to improve the archive tool [see issue 3307].
----------------------------------------------------------

                 Key: HADOOP-3929
                 URL: https://issues.apache.org/jira/browse/HADOOP-3929
             Project: Hadoop Core
          Issue Type: Improvement
            Reporter: Dick King


I have a tool written atop the libhdfs library that implements an archive 
system.  It's working [in C++]

JIRA #3307 documents a native DFS archive system, first available in 18.0 .

I would like to port my code, and thereby extend that system in 3 directions:

1: archives will be immutable in 18.0 .  I would like to provide an API to let 
you add, delete, and modify files.

   1a: You would want to be able to batch such operations and perform them all 
at once when a batch is complete.

2: the tree to be archived must be in dfs in 18.0 .  I would like it to be 
possible for the tree to contain some local filesystem files as well [think 
org.apache.hadoop.fs.Path ]

   2a: I realize that this would preclude parallel modification when a local 
filesystem is used

   2b: I don't have a convincing story re two processes simultaneously 
modifying the same archive, even for a disjoint set of files, but I'm willing 
to discuss this.

3: i would like it to be possible to batch the changes and make them all in one 
operation, to reduce DFS activity.

I had in-person discussions on this with user mahadev .  He is encouraging me 
to file this bug report so we can broaden this discussion.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to