Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

Rohan Rai Sun, 26 Sep 2010 22:26:34 -0700

Oh  Sorry I am completely out of  sync...

Can you tell how did you lzo'ed and indexed  the file


Regards
Rohan

Rohan Rai wrote:

Oh Sorry I did not see this mail ...

Its not an official patch/release

But here is a fork on elephant-bird which works with pig 0.7

for  normal LZOText Loading etc

(NOt HbaseLoader)

Regards
Rohan

Dmitriy Ryaboy wrote:

The 0.7 branch is not tested.. it's quite likely it doesn't actually work
:).
Rohan Rai was working on it.. Rohan, think you can take a look and help Ed
out?

Ed, you may want to check if the same input works when you use Pig 0.6 (and
the official elephant-bird, on Kevin Weil's github).

-D

On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> wrote:

Hello,

After getting all the errors to go away with LZO libraries not being found
and missing jar files for elephant-bird I've run into a new problem when
using the elephant-bird branch for pig 0.7

The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

This just dumps out the contents of the test_input_chars.txt file which is
tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

I then lzop the test file to get test_input_chars.txt.lzo (I decompressed
this with lzop -d to make sure the compression worked fine and everything
looks good).
If I run the exact same script provided above on the lzo file it works
fine.  However, this file is really small and doesn't need to use indexes.
As a result, I wanted to
have LZO support that worked with indexes.  Based on this I decided to try
out the elephant-bird branch for pig 0.7 located here (
http://github.com/hirohanin/elephant-bird/) as
recommended by Dimitriy.

I created the following pig script that mirrors the above script but should
hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

When I run this script which uses the LzoTokenizedLoader there is no
output.  The script appears to run without errors but there are zero
Records
Written and 0 Bytes Written.

Here is the exact output:

grunt > DUMP A;
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [     ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [     ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [     ]
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
-
(Name:

Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
- 1-4 Operator Key: 1-4
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
[Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
GenericOptionsParser for parsing the arguments.  Applications should
implement Tool for the same.
[Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimiter [     ]
[Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
Total input paths to process : 1
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201009101108_0151
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at
http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Succesfully stored result in
"hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written: 0
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written: 0
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Spillable Memory Manager spill count : 0
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Proactive spill count : 0
[main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
[main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
input paths to process: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total input paths to process: 1
grunt >

I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader
or
if there is a problem with the class itself (most likely the problem is
with
my code heh)  Thank you for any help!

~Ed



The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you have received 
this communication in error, please notify us immediately by responding to this 
email and then delete it from your system. The firm is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.
.



The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you have received 
this communication in error, please notify us immediately by responding to this 
email and then delete it from your system. The firm is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.

Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

Reply via email to