What I want to ask is: - how do I read the values from sequence files that are block, or record compressed, or uncompressed?
- how do I know if the sequence file is block compressed, record compressed, or uncompressed? - how do I know if it's a sequence file or a Textfile? On 3 April 2012 16:01, Pedro Costa <psdc1...@gmail.com> wrote: > If I want to compare 2 sequence files to see if they are the same, how do > I compare? > > > > On 19 December 2011 14:43, Robert Evans <ev...@yahoo-inc.com> wrote: > >> Oh I forgot to say that part of the Random Characters are actually random >> characters. Sequence files store a set of random characters as synch >> points within the file. This allows for splitting the file easily without >> a high risk that the random sequence appears inside the data itself just by >> chance. >> >> --Bobby Evans >> >> On 12/19/11 7:51 AM, "Pedro Costa" <psdc1...@gmail.com> wrote: >> >> Hi, >> >> In the hadoop MapReduce, I've executed the webdatascan example, and the >> reduce output is in a SequeceFile. The result is shows here ( >> http://paste.lisp.org/display/126572). What's the trash (random >> characters), like "u 265 >> 0000100 330 320 252 " \n # ; 374 5 211 V ' 340 376" in the output? Is the >> output correct? >> >> >> 0000000 S E Q 006 031 o r g . a p a c h e . >> 0000020 h a d o o p . i o . T e x t 031 o >> 0000040 r g . a p a c h e . h a d o o p >> 0000060 . i o . T e x t \0 \0 \0 \0 \0 \0 u 265 >> 0000100 330 320 252 " \n # ; 374 5 211 V ' 340 376 \0 \0 >> 0000120 \0 X \0 \0 \0 037 a p p l e a p p >> 0000140 l e b a n a n a a p p l e >> 0000160 a p p l e 7 c a r r o t c a >> 0000200 r r o t c a r r o t c a r r >> 0000220 o t a p p l e b a n a n a >> 0000240 c a r r o t b a n a n a >> 0000256 >> >> >> -- >> Thanks, >> >> > > > -- > Best regards, > > -- Best regards,