Bennie Schut
Tue, 17 Nov 2009 02:06:53 -0800
Another zebra related question.
I couldn't find a lot of documentation on zebra but I figured you can
change compression codec with a syntax like this:
store outfile into '/user/dwh/screenname2.zebra' using
org.apache.hadoop.zebra.pig.TableStorer('compress by lzo');
And in theory disable compression like this:
store outfile into '/user/dwh/screenname3.zebra' using
org.apache.hadoop.zebra.pig.TableStorer('compress by none');
But it doesn't seem to understand the "none" as a compressor.
java.io.IOException: ColumnGroup.Writer constructor failed : Partition
constructor failed :Encountered " <IDENTIFIER> "none "" at line 1,
column 13.
Was
expecting:
<COMPRESSOR>
...
at
org.apache.hadoop.zebra.io.BasicTable$Writer.<init>(BasicTable.java:1116)
at
org.apache.hadoop.zebra.pig.TableOutputFormat.checkOutputSpecs(TableStorer.java:154)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:772)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at
org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at
java.lang.Thread.run(Thread.java:619)
I actually tried this because when I use the zebra result on further
processing it only uses 2 mappers instead of the 230 mappers on the
original file. I remember hadoop can not split gz files so I figured
using compression might cause it to use so little mappers. Anyone
perhaps know a different approach on this?
Thanks,
Bennie.