[ 
https://issues.apache.org/jira/browse/HADOOP-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-7519:
---------------------------------

    Attachment: hadoop-7519-0.20.2XX-1.patch

Some of my users had this need in the past. Wrote a short wrapper to 
org.apache.tools.tar.  I got the idea from reading 
http://stuartsierra.com/2008/04/24/a-million-little-files where this user 
converted tar file into sequence file.

This is not ready to commit at all, but I think this gives an idea.  It needs 
ant.jar on its classpath.

{noformat}
% export HADOOP_CLASSPATH=./contrib/tar/lib/ant.jar
% hadoop jar contrib/tar/hadoop-tar.jar --help
usage: hadoop jar hadoop-tar.jar [options]
 -c,--create                 create a new archive
 -C,--directory <DIR>        Set the working directory to DIR
 -f,--file <FILE>            Use archive file (default '-' for
                             stdin/stdout)
    --help                   show help message
    --overwrite              overwrite existing directory
 -P,--absolute-names         don't strip leading / from file name
 -p,--preserve-permissions   apply recorded permissions instead of
                             applying user's umask when extracting files
    --same-group             create extracted files with the same group id
    --same-owner             create extracted files with the same
                             ownership
 -t,--list                   list files from an archive
 -v,--verbose                print verbose output
 -x,--extract                extract files from an archive
 -z,--compress               filter the archive through
                             compress/uncompress gzip
{noformat}


> hadoop fs commands should support tar/gzip or an equivalent
> -----------------------------------------------------------
>
>                 Key: HADOOP-7519
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7519
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.1
>            Reporter: Keith Wiley
>            Priority: Minor
>              Labels: hadoop
>         Attachments: hadoop-7519-0.20.2XX-1.patch
>
>
> The "hadoop fs" subcommand should offer options for batching, unbatching, 
> compressing, and uncompressing files on hdfs.  The equivalent of "hadoop fs 
> -tar" or "hadoop fs -gzip".  These commands would greatly facilitate moving 
> large data (especially in a large number of files) back and forth from hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to