Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

Dmitriy Ryaboy Thu, 23 Sep 2010 11:05:13 -0700

The 0.7 branch is not tested.. it's quite likely it doesn't actually work
:).
Rohan Rai was working on it.. Rohan, think you can take a look and help Ed
out?


Ed, you may want to check if the same input works when you use Pig 0.6 (and
the official elephant-bird, on Kevin Weil's github).

-D

On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> wrote:

> Hello,
>
> After getting all the errors to go away with LZO libraries not being found
> and missing jar files for elephant-bird I've run into a new problem when
> using the elephant-bird branch for pig 0.7
>
> The following simple pig script works as expected
>
>     REGISTER elephant-bird-1.0.jar
>     REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
>     A = load '/usr/foo/input/test_input_chars.txt';
>     DUMP A;
>
> This just dumps out the contents of the test_input_chars.txt file which is
> tab delimited. The output looks like:
>
>     (1,a,a,a,a,a,a)
>     (2,b,b,b,b,b,b)
>     (3,c,c,c,c,c,c)
>     (4,d,d,d,d,d,d)
>     (5,e,e,e,e,e,e)
>
> I then lzop the test file to get test_input_chars.txt.lzo (I decompressed
> this with lzop -d to make sure the compression worked fine and everything
> looks good).
> If I run the exact same script provided above on the lzo file it works
> fine.  However, this file is really small and doesn't need to use indexes.
> As a result, I wanted to
> have LZO support that worked with indexes.  Based on this I decided to try
> out the elephant-bird branch for pig 0.7 located here (
> http://github.com/hirohanin/elephant-bird/) as
> recommended by Dimitriy.
>
> I created the following pig script that mirrors the above script but should
> hopefully work on LZO files (including indexed ones)
>
>     REGISTER elephant-bird-1.0.jar
>     REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
>     A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
>     DUMP A;
>
> When I run this script which uses the LzoTokenizedLoader there is no
> output.  The script appears to run without errors but there are zero
> Records
> Written and 0 Bytes Written.
>
> Here is the exact output:
>
> grunt > DUMP A;
> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
> LzoTokenizedLoader with given delimited [     ]
> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
> LzoTokenizedLoader with given delimited [     ]
> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
> LzoTokenizedLoader with given delimited [     ]
> [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
> -
> (Name:
>
> Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
> - 1-4 Operator Key: 1-4
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
> GenericOptionsParser for parsing the arguments.  Applications should
> implement Tool for the same.
> [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
> LzoTokenizedLoader with given delimiter [     ]
> [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
> Total input paths to process : 1
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_201009101108_0151
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - More information at
> http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 50% complete
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Succesfully stored result in
> "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Records written: 0
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Bytes written: 0
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Spillable Memory Manager spill count : 0
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Proactive spill count : 0
> [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Success!
> [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
> input paths to process: 1
> [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
> Total input paths to process: 1
> grunt >
>
> I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader
> or
> if there is a problem with the class itself (most likely the problem is
> with
> my code heh)  Thank you for any help!
>
> ~Ed
>

Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

Reply via email to