Re: HDFS file copy module for Malhar

Chinmay Kolhatkar Wed, 09 Mar 2016 07:40:25 -0800

+1.

On Wed, Mar 9, 2016 at 8:38 PM, Yogi Devendra <[email protected]>
wrote:


> Hi,
>
> I mentioned earlier here,
>
> http://mail-archives.apache.org/mod_mbox/apex-dev/201602.mbox/%3CCAHekGF9xNa6qvvt4ySGBC4SmCN7_Hn2r9rj2SQSV%2BE1Cc5A0fQ%40mail.gmail.com%3E
>
> I am proposing HDFS file copy module.
> JIRA created for this work is available here :
> https://issues.apache.org/jira/browse/APEXMALHAR-2013
>
> Please note that, these work is related to but different from
> https://issues.apache.org/jira/browse/APEXMALHAR-2009 which talks about
> concrete operator for writing data to HDFS tuple by tuple.
>
> Main difference here is in case of file copy module; block sequence for a
> file has to be retained. Thus, we need to pass on additional information
> like FileMetaData, BlockMetaData from the upstream operator.
>
> Usecase
> ------------
> This module can be used with HDFS input module to copy files from HDFS to
> HDFS.
> Large files will be copied in block-by-block approach.
>
> Functionality
> -----------------
>
>    1. Writing files to HDFS using FileMetaData, BlockMetaData, BlockData
>    emitted by HDFS input module.
>    2. Blocks data have to be synchronized to retain original sequence from
>    source
>    3. Support to copy multiple files, recursive copy of directory structure
>    etc.
>    4. Metrics for summary information on the progress of file copy.
>
> Let me know your thoughts on this. You may post your comments on the JIRA
> https://issues.apache.org/jira/browse/APEXMALHAR-2013
>
> ~ Yogi
>

Re: HDFS file copy module for Malhar

Reply via email to