[ 
https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670862#comment-15670862
 ] 

Yongjun Zhang commented on HADOOP-8065:
---------------------------------------

Hi [~raviprak],

One thing came to my mind is, whether we should automatically add postfix 
(based on the codec) to the target file names, 
# when  either distcp src and tgt is directory. 
# when both src and tgt are files, user should be responsible to add the 
postfix.

This adds some complexity.. Another option is, we don't change the target file 
name,  and have a command like "file" in unix/linux.
{code}
FILE(1)                   BSD General Commands Manual                  FILE(1)

NAME
     file - determine file type

SYNOPSIS
     file [-bchikLNnprsvz0] [--apple] [--mime-encoding] [--mime-type] [-e 
testname] [-F separator] [-f namefile] [-m magicfiles] file ...
     file -C [-m magicfiles]
     file [--help]
{code}

This alternative seems simpler and cleaner and more robust.

Or we can extend  
{code}
hadoop fs -getfattr [-R] -n name | -d [-e en] <path>
{code}
to report file type etc.

Thanks.



 

> distcp should have an option to compress data while copying.
> ------------------------------------------------------------
>
>                 Key: HADOOP-8065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8065
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.2
>            Reporter: Suresh Antony
>            Assignee: Suraj Nayak
>            Priority: Minor
>              Labels: distcp
>             Fix For: 0.20.2
>
>         Attachments: HADOOP-8065-trunk_2015-11-03.patch, 
> HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, 
> HADOOP-8065.005.patch, HADOOP-8065.006.patch, patch.distcp.2012-02-10
>
>
> We would like compress the data while transferring from our source system to 
> target system. One way to do this is to write a map/reduce job to compress 
> that after/before being transferred. This looks inefficient. 
> Since distcp already reading writing data it would be better if it can 
> accomplish while doing this. 
> Flip side of this is that distcp -update option can not check file size 
> before copying data. It can only check for the existence of file. 
> So I propose if -compress option is given then file size is not checked.
> Also when we copy file appropriate extension needs to be added to file 
> depending on compression type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to