Also note that HDFS already does checksums which I believe you can retrieve:
http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/fs/FileSystem.html#getFileChecksum(org.apache.hadoop.fs.Path) http://hadoop.apache.org/common/docs/r1.0.3/hdfs_design.html#Data+Integrity Brock On Sun, Jul 29, 2012 at 12:35 PM, Yaron Gonen <yaron.go...@gmail.com> wrote: > Thanks! > I'll dig into those classes to figure out my next step. > > Anyway, I just realized the block-level compression has nothing to do with > HDFS blocks. An HDFS block can contain an unknown number of compressed > blocks, which makes my efforts kind of worthless. > > thanks again! > > > On Sun, Jul 29, 2012 at 6:40 PM, Tim Broberg <tim.brob...@exar.com> wrote: > >> What if you wrote a CompressionOutputStream class that wraps around the >> existing ones and outputs a hash per <n> bytes and a CompressionInputStream >> that checks them? ...and a Codec that wraps your compressors around >> arbitrary existing codecs. >> >> Sounds like a bunch of work, and I'm not sure where you would store the >> hashes, but it would get the data into your clutches the instant it's >> available. >> >> - Tim. >> >> On Jul 29, 2012, at 7:41 AM, "Yaron Gonen" <yaron.go...@gmail.com> wrote: >> >> Hi, >> I've created a SequeceFile.Writer with block-level compression. >> I'd like to create a SHA1 hash for each block written. How do I do that? >> I didn't see any way to take the compression under my control in order to >> know when a block is over. >> >> Thanks, >> Yaron >> >> >> ------------------------------ >> The information contained in this email is intended only for the personal >> and confidential use of the recipient(s) named above. The information and >> any attached documents contained in this message may be Exar confidential >> and/or legally privileged. If you are not the intended recipient, you are >> hereby notified that any review, use, dissemination or reproduction of this >> message is strictly prohibited and may be unlawful. If you have received >> this communication in error, please notify us immediately by return email >> and delete the original message. >> > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/