Oh  Sorry I am completely out of  sync...

Can you tell how did you lzo'ed and indexed  the file


Rohan Rai wrote:
Oh Sorry I did not see this mail ...

Its not an official patch/release

But here is a fork on elephant-bird which works with pig 0.7

for  normal LZOText Loading etc

(NOt HbaseLoader)


Dmitriy Ryaboy wrote:

The 0.7 branch is not tested.. it's quite likely it doesn't actually work
Rohan Rai was working on it.. Rohan, think you can take a look and help Ed

Ed, you may want to check if the same input works when you use Pig 0.6 (and
the official elephant-bird, on Kevin Weil's github).


On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> wrote:


After getting all the errors to go away with LZO libraries not being found
and missing jar files for elephant-bird I've run into a new problem when
using the elephant-bird branch for pig 0.7

The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

This just dumps out the contents of the test_input_chars.txt file which is
tab delimited. The output looks like:


I then lzop the test file to get test_input_chars.txt.lzo (I decompressed
this with lzop -d to make sure the compression worked fine and everything
looks good).
If I run the exact same script provided above on the lzo file it works
fine.  However, this file is really small and doesn't need to use indexes.
As a result, I wanted to
have LZO support that worked with indexes.  Based on this I decided to try
out the elephant-bird branch for pig 0.7 located here (
http://github.com/hirohanin/elephant-bird/) as
recommended by Dimitriy.

I created the following pig script that mirrors the above script but should
hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    DUMP A;

When I run this script which uses the LzoTokenizedLoader there is no
output.  The script appears to run without errors but there are zero
Written and 0 Bytes Written.

Here is the exact output:

grunt > DUMP A;
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [     ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [     ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [     ]
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine

- 1-4 Operator Key: 1-4
[main] INFO

- MR plan size before optimization: 1
[main] INFO

- MR plan size after optimization: 1
[main] INFO

- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO

- Setting up single store job
[main] INFO

- 1 map-reduce job(s) waiting for submission.
[Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
GenericOptionsParser for parsing the arguments.  Applications should
implement Tool for the same.
[Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimiter [     ]
[Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
Total input paths to process : 1
[main] INFO

- 0% complete
[main] INFO

- HadoopJobId: job_201009101108_0151
[main] INFO

- More information at
[main] INFO

- 50% complete
[main] INFO

- 100% complete
[main] INFO

- Succesfully stored result in
[main] INFO

- Records written: 0
[main] INFO

- Bytes written: 0
[main] INFO

- Spillable Memory Manager spill count : 0
[main] INFO

- Proactive spill count : 0
[main] INFO

- Success!
[main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
input paths to process: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total input paths to process: 1
grunt >

I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader
if there is a problem with the class itself (most likely the problem is
my code heh)  Thank you for any help!



