problems reading compressed sequencefiles in streaming (0.13.1)

Joydeep Sen Sarma Fri, 26 Oct 2007 00:30:30 -0700

I was hoping to use -inputformat SequenceFileAsTextInputFormat to process 
compressed sequencefiles in streaming jobs


 

However, using a python mapper that just echoes out each line as it gets, and 
numreducetasks=0 - here's what the streaming job output looks like:

 

SEQ^F org.apache.hadoop.io.IntWritable^Yorg.apache.hadoop.io.Text^A^A'[EMAIL 
PROTECTED]@[EMAIL PROTECTED]@Z+rï¿½ï¿½ï¿½ï¿½ï¿½ï¿½^Fï¿½

 

So seems like the input file was not treated as sequencefile. 

 

I must be missing some args - except don't understand what. Help appreciated ..

 

Thx,

 

Joydeep

problems reading compressed sequencefiles in streaming (0.13.1)

Reply via email to