+1. On Wed, Mar 9, 2016 at 8:38 PM, Yogi Devendra <[email protected]> wrote:
> Hi, > > I mentioned earlier here, > > http://mail-archives.apache.org/mod_mbox/apex-dev/201602.mbox/%3CCAHekGF9xNa6qvvt4ySGBC4SmCN7_Hn2r9rj2SQSV%2BE1Cc5A0fQ%40mail.gmail.com%3E > > I am proposing HDFS file copy module. > JIRA created for this work is available here : > https://issues.apache.org/jira/browse/APEXMALHAR-2013 > > Please note that, these work is related to but different from > https://issues.apache.org/jira/browse/APEXMALHAR-2009 which talks about > concrete operator for writing data to HDFS tuple by tuple. > > Main difference here is in case of file copy module; block sequence for a > file has to be retained. Thus, we need to pass on additional information > like FileMetaData, BlockMetaData from the upstream operator. > > Usecase > ------------ > This module can be used with HDFS input module to copy files from HDFS to > HDFS. > Large files will be copied in block-by-block approach. > > Functionality > ----------------- > > 1. Writing files to HDFS using FileMetaData, BlockMetaData, BlockData > emitted by HDFS input module. > 2. Blocks data have to be synchronized to retain original sequence from > source > 3. Support to copy multiple files, recursive copy of directory structure > etc. > 4. Metrics for summary information on the progress of file copy. > > Let me know your thoughts on this. You may post your comments on the JIRA > https://issues.apache.org/jira/browse/APEXMALHAR-2013 > > ~ Yogi >
