I can envision an M/R job for the purpose of manipulating hdfs, such as (de)compressing files and resaving them back to HDFS. I just didn't think it should be necessary to *write a program* to do something so seemingly minimal. This (tarring/compressing/etc.) seems like an obvious method for moving data back and forth; I would expect the tools to support it.
I'll read up on "-text". Maybe that really is what I wanted, although I'm dubious since this has nothing to do with textual data at all. Anyway, I'll see what I can find on that. Thanks. On Aug 4, 2011, at 9:04 PM, Harsh J wrote: > Keith, > > The 'hadoop fs -text' tool does decompress a file given to it if > needed/able, but what you could also do is run a distributed mapreduce > job that converts from compressed to decompressed, that'd be much > faster. > > On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley <[email protected]> wrote: >> Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on >> a gzipped (or zipped) archive, one file, much smaller total megs. Then I >> want to decompress the archive on HDFS? I can't figure out what "hd fs" >> type command would do such a thing. >> >> Thanks. ________________________________________________________________________________ Keith Wiley [email protected] keithwiley.com music.keithwiley.com "It's a fine line between meticulous and obsessive-compulsive and a slippery rope between obsessive-compulsive and debilitatingly slow." -- Keith Wiley ________________________________________________________________________________
