
After getting all the errors to go away with LZO libraries not being found
and missing jar files for elephant-bird I've run into a new problem when
using the elephant-bird branch for pig 0.7

The following simple pig script works as expected

     REGISTER elephant-bird-1.0.jar
     REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
     A = load '/usr/foo/input/test_input_chars.txt';
     DUMP A;

This just dumps out the contents of the test_input_chars.txt file which is
tab delimited. The output looks like:


I then lzop the test file to get test_input_chars.txt.lzo (I decompressed
this with lzop -d to make sure the compression worked fine and everything
looks good).
If I run the exact same script provided above on the lzo file it works
fine.  However, this file is really small and doesn't need to use indexes.
As a result, I wanted to
have LZO support that worked with indexes.  Based on this I decided to try
out the elephant-bird branch for pig 0.7 located here (
http://github.com/hirohanin/elephant-bird/) as
recommended by Dimitriy.

I created the following pig script that mirrors the above script but should
hopefully work on LZO files (including indexed ones)

     REGISTER elephant-bird-1.0.jar
     REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
     A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
     DUMP A;

When I run this script which uses the LzoTokenizedLoader there is no
output.  The script appears to run without errors but there are zero Records
Written and 0 Bytes Written.

Here is the exact output:

grunt > DUMP A;
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [     ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [     ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [     ]
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
- 1-4 Operator Key: 1-4
[main] INFO
- MR plan size before optimization: 1
[main] INFO
- MR plan size after optimization: 1
[main] INFO
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO
- Setting up single store job
[main] INFO
- 1 map-reduce job(s) waiting for submission.
[Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
GenericOptionsParser for parsing the arguments.  Applications should
implement Tool for the same.
[Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimiter [     ]
[Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
Total input paths to process : 1
[main] INFO
- 0% complete
[main] INFO
- HadoopJobId: job_201009101108_0151
[main] INFO
- More information at
[main] INFO
- 50% complete
[main] INFO
- 100% complete
[main] INFO
- Succesfully stored result in
[main] INFO
- Records written: 0
[main] INFO
- Bytes written: 0
[main] INFO
- Spillable Memory Manager spill count : 0
[main] INFO
- Proactive spill count : 0
[main] INFO
- Success!
[main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
input paths to process: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total input paths to process: 1
grunt >

I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader or
if there is a problem with the class itself (most likely the problem is with
my code heh)  Thank you for any help!


