[
https://issues.apache.org/jira/browse/HADOOP-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Koji Noguchi updated HADOOP-7519:
---------------------------------
Attachment: hadoop-7519-0.20.2XX-1.patch
Some of my users had this need in the past. Wrote a short wrapper to
org.apache.tools.tar. I got the idea from reading
http://stuartsierra.com/2008/04/24/a-million-little-files where this user
converted tar file into sequence file.
This is not ready to commit at all, but I think this gives an idea. It needs
ant.jar on its classpath.
{noformat}
% export HADOOP_CLASSPATH=./contrib/tar/lib/ant.jar
% hadoop jar contrib/tar/hadoop-tar.jar --help
usage: hadoop jar hadoop-tar.jar [options]
-c,--create create a new archive
-C,--directory <DIR> Set the working directory to DIR
-f,--file <FILE> Use archive file (default '-' for
stdin/stdout)
--help show help message
--overwrite overwrite existing directory
-P,--absolute-names don't strip leading / from file name
-p,--preserve-permissions apply recorded permissions instead of
applying user's umask when extracting files
--same-group create extracted files with the same group id
--same-owner create extracted files with the same
ownership
-t,--list list files from an archive
-v,--verbose print verbose output
-x,--extract extract files from an archive
-z,--compress filter the archive through
compress/uncompress gzip
{noformat}
> hadoop fs commands should support tar/gzip or an equivalent
> -----------------------------------------------------------
>
> Key: HADOOP-7519
> URL: https://issues.apache.org/jira/browse/HADOOP-7519
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.20.1
> Reporter: Keith Wiley
> Priority: Minor
> Labels: hadoop
> Attachments: hadoop-7519-0.20.2XX-1.patch
>
>
> The "hadoop fs" subcommand should offer options for batching, unbatching,
> compressing, and uncompressing files on hdfs. The equivalent of "hadoop fs
> -tar" or "hadoop fs -gzip". These commands would greatly facilitate moving
> large data (especially in a large number of files) back and forth from hdfs.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira