Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

ed Wed, 29 Sep 2010 12:19:16 -0700

Hello,

I tested the newest push to the hirohanin elephant-bird branch (for pig 0.7)
and had an error when trying to use LzoTokenizedLoader with the following
pig script:


     REGISTER elephant-bird-1.0.jar
     REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
     A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
     DUMP A;

The error I get is in the mapper logs and is as follows:

INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
library
INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded & initialized
native-lzo library
INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader:
LzoTokenizedLoader with given delimiter [     ]
INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking to
split start at pos 0
FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
java.lang.NoSuchMethodError:
org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext;
     at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC (Unknown
Source)
     at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter
(Unknown Source)
     at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter
(Unknown Source)
     at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext
(Unknown Source)
     at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue
(PigRecordReader.java:142)
     at
org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423)
     at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
(MapContent.java:67)
     at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
     at org.apache.hadoop.mapred.Child.main(Child.java:170)

Do you think I'm forgetting some required library?

Thank you!

~Ed

On Tue, Sep 28, 2010 at 2:10 PM, ed <hadoopn...@gmail.com> wrote:

> Thank you Rohan,  I really appreciate your help!  I'll give it shot and
> post back if it works.
>
> ~Ed
>
>
> On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan....@inmobi.com> wrote:
>
>> Just corrected/tested and pushed LzoTokenizedLoader to the personal fork
>>
>> Hopefully it works now
>>
>>
>> Regards
>> Rohan
>>
>> Dmitriy Ryaboy wrote:
>>
>>> lzop should work.
>>>
>>> On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan....@inmobi.com>
>>> wrote:
>>>
>>>
>>>  Well
>>>>
>>>> I haven't tried (rather I don't remember) compressing via lzop and then
>>>> putting on cluster...
>>>> So cant tell you about that...Here is what works for me.
>>>>
>>>> I do it by first putting the file on cluster and then doing Stream
>>>> Compression.
>>>>
>>>> And yes it need not be indexed (I guess it doesn't matter for  small
>>>> test file, otherwise it is unwise
>>>> for one loses the benefit of parallelism)
>>>>
>>>> Regards
>>>> Rohan
>>>>
>>>>
>>>> pig wrote:
>>>>
>>>>
>>>>  Hi Rohan,
>>>>>
>>>>> The test file (test_input_chars.txt.lzo) is not indexed.  I created it
>>>>> using
>>>>> the command
>>>>>
>>>>> 'lzop test_input_chars.txt'
>>>>>
>>>>> It's a really small file (only 6 lines) so I didn't think it needed to
>>>>> be
>>>>> index.  Do all files regardless of size need to be indexed for the
>>>>> LzoTokenizedLoader to work?
>>>>>
>>>>> Thank you!
>>>>>
>>>>> ~Ed
>>>>>
>>>>> On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan....@inmobi.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>  Oh  Sorry I am completely out of  sync...
>>>>>
>>>>>  Can you tell how did you lzo'ed and indexed  the file
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Rohan
>>>>>>
>>>>>> Rohan Rai wrote:
>>>>>>
>>>>>>
>>>>>>  Oh Sorry I did not see this mail ...
>>>>>>
>>>>>>  Its not an official patch/release
>>>>>>>
>>>>>>> But here is a fork on elephant-bird which works with pig 0.7
>>>>>>>
>>>>>>> for  normal LZOText Loading etc
>>>>>>>
>>>>>>> (NOt HbaseLoader)
>>>>>>>
>>>>>>> Regards
>>>>>>> Rohan
>>>>>>>
>>>>>>> Dmitriy Ryaboy wrote:
>>>>>>>
>>>>>>>  The 0.7 branch is not tested.. it's quite likely it doesn't actually
>>>>>>> work
>>>>>>>
>>>>>>>  :).
>>>>>>>
>>>>>>>  Rohan Rai was working on it.. Rohan, think you can take a look and
>>>>>>>> help
>>>>>>>> Ed
>>>>>>>> out?
>>>>>>>>
>>>>>>>> Ed, you may want to check if the same input works when you use Pig
>>>>>>>> 0.6
>>>>>>>> (and
>>>>>>>> the official elephant-bird, on Kevin Weil's github).
>>>>>>>>
>>>>>>>> -D
>>>>>>>>
>>>>>>>> On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Hello,
>>>>>>>>
>>>>>>>>  After getting all the errors to go away with LZO libraries not
>>>>>>>> being
>>>>>>>>
>>>>>>>>  found
>>>>>>>>> and missing jar files for elephant-bird I've run into a new problem
>>>>>>>>> when
>>>>>>>>> using the elephant-bird branch for pig 0.7
>>>>>>>>>
>>>>>>>>> The following simple pig script works as expected
>>>>>>>>>
>>>>>>>>>  REGISTER elephant-bird-1.0.jar
>>>>>>>>>  REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
>>>>>>>>>  A = load '/usr/foo/input/test_input_chars.txt';
>>>>>>>>>  DUMP A;
>>>>>>>>>
>>>>>>>>> This just dumps out the contents of the test_input_chars.txt file
>>>>>>>>> which
>>>>>>>>> is
>>>>>>>>> tab delimited. The output looks like:
>>>>>>>>>
>>>>>>>>>  (1,a,a,a,a,a,a)
>>>>>>>>>  (2,b,b,b,b,b,b)
>>>>>>>>>  (3,c,c,c,c,c,c)
>>>>>>>>>  (4,d,d,d,d,d,d)
>>>>>>>>>  (5,e,e,e,e,e,e)
>>>>>>>>>
>>>>>>>>> I then lzop the test file to get test_input_chars.txt.lzo (I
>>>>>>>>> decompressed
>>>>>>>>> this with lzop -d to make sure the compression worked fine and
>>>>>>>>> everything
>>>>>>>>> looks good).
>>>>>>>>> If I run the exact same script provided above on the lzo file it
>>>>>>>>> works
>>>>>>>>> fine.  However, this file is really small and doesn't need to use
>>>>>>>>> indexes.
>>>>>>>>> As a result, I wanted to
>>>>>>>>> have LZO support that worked with indexes.  Based on this I decided
>>>>>>>>> to
>>>>>>>>> try
>>>>>>>>> out the elephant-bird branch for pig 0.7 located here (
>>>>>>>>> http://github.com/hirohanin/elephant-bird/) as
>>>>>>>>> recommended by Dimitriy.
>>>>>>>>>
>>>>>>>>> I created the following pig script that mirrors the above script
>>>>>>>>> but
>>>>>>>>> should
>>>>>>>>> hopefully work on LZO files (including indexed ones)
>>>>>>>>>
>>>>>>>>>  REGISTER elephant-bird-1.0.jar
>>>>>>>>>  REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
>>>>>>>>>  A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
>>>>>>>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
>>>>>>>>>  DUMP A;
>>>>>>>>>
>>>>>>>>> When I run this script which uses the LzoTokenizedLoader there is
>>>>>>>>> no
>>>>>>>>> output.  The script appears to run without errors but there are
>>>>>>>>> zero
>>>>>>>>> Records
>>>>>>>>> Written and 0 Bytes Written.
>>>>>>>>>
>>>>>>>>> Here is the exact output:
>>>>>>>>>
>>>>>>>>> grunt > DUMP A;
>>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>>>>>>> LzoTokenizedLoader with given delimited [     ]
>>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>>>>>>> LzoTokenizedLoader with given delimited [     ]
>>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>>>>>>> LzoTokenizedLoader with given delimited [     ]
>>>>>>>>> [main] INFO
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
>>>>>>>>> -
>>>>>>>>> (Name:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
>>>>>>>>> - 1-4 Operator Key: 1-4
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>>>>>> - MR plan size before optimization: 1
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>>>>>> - MR plan size after optimization: 1
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>>>>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to
>>>>>>>>> default
>>>>>>>>> 0.3
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>>>>>> - Setting up single store job
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - 1 map-reduce job(s) waiting for submission.
>>>>>>>>> [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
>>>>>>>>> GenericOptionsParser for parsing the arguments.  Applications
>>>>>>>>> should
>>>>>>>>> implement Tool for the same.
>>>>>>>>> [Thread-12] INFO
>>>>>>>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader
>>>>>>>>> -
>>>>>>>>> LzoTokenizedLoader with given delimiter [     ]
>>>>>>>>> [Thread-12] INFO
>>>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat
>>>>>>>>> -
>>>>>>>>> Total input paths to process : 1
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - 0% complete
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - HadoopJobId: job_201009101108_0151
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - More information at
>>>>>>>>> http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - 50% complete
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - 100% complete
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - Succesfully stored result in
>>>>>>>>> "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - Records written: 0
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - Bytes written: 0
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - Spillable Memory Manager spill count : 0
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - Proactive spill count : 0
>>>>>>>>> [main] INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>> - Success!
>>>>>>>>> [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
>>>>>>>>> Total
>>>>>>>>> input paths to process: 1
>>>>>>>>> [main] INFO
>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
>>>>>>>>> Total input paths to process: 1
>>>>>>>>> grunt >
>>>>>>>>>
>>>>>>>>> I'm not sure if I'm doing something wrong in my use of
>>>>>>>>> LzoTokenizedLoader
>>>>>>>>> or
>>>>>>>>> if there is a problem with the class itself (most likely the
>>>>>>>>> problem
>>>>>>>>> is
>>>>>>>>> with
>>>>>>>>> my code heh)  Thank you for any help!
>>>>>>>>>
>>>>>>>>> ~Ed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  .
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   The information contained in this communication is intended
>>>>>>>> solely for
>>>>>>>>
>>>>>>>>  the
>>>>>>> use of the individual or entity to whom it is addressed and others
>>>>>>> authorized to receive it. It may contain confidential or legally
>>>>>>> privileged
>>>>>>> information. If you are not the intended recipient you are hereby
>>>>>>> notified
>>>>>>> that any disclosure, copying, distribution or taking any action in
>>>>>>> reliance
>>>>>>> on the contents of this information is strictly prohibited and may be
>>>>>>> unlawful. If you have received this communication in error, please
>>>>>>> notify us
>>>>>>> immediately by responding to this email and then delete it from your
>>>>>>> system.
>>>>>>> The firm is neither liable for the proper and complete transmission
>>>>>>> of
>>>>>>> the
>>>>>>> information contained in this communication nor for any delay in its
>>>>>>> receipt.
>>>>>>> .
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  The information contained in this communication is intended solely
>>>>>>> for
>>>>>>>
>>>>>>>  the
>>>>>> use of the individual or entity to whom it is addressed and others
>>>>>> authorized to receive it. It may contain confidential or legally
>>>>>> privileged
>>>>>> information. If you are not the intended recipient you are hereby
>>>>>> notified
>>>>>> that any disclosure, copying, distribution or taking any action in
>>>>>> reliance
>>>>>> on the contents of this information is strictly prohibited and may be
>>>>>> unlawful. If you have received this communication in error, please
>>>>>> notify
>>>>>> us
>>>>>> immediately by responding to this email and then delete it from your
>>>>>> system.
>>>>>> The firm is neither liable for the proper and complete transmission of
>>>>>> the
>>>>>> information contained in this communication nor for any delay in its
>>>>>> receipt.
>>>>>>
>>>>>>
>>>>>>  .
>>>>>>
>>>>>>
>>>>>
>>>>>  The information contained in this communication is intended solely for
>>>> the
>>>> use of the individual or entity to whom it is addressed and others
>>>> authorized to receive it. It may contain confidential or legally
>>>> privileged
>>>> information. If you are not the intended recipient you are hereby
>>>> notified
>>>> that any disclosure, copying, distribution or taking any action in
>>>> reliance
>>>> on the contents of this information is strictly prohibited and may be
>>>> unlawful. If you have received this communication in error, please
>>>> notify us
>>>> immediately by responding to this email and then delete it from your
>>>> system.
>>>> The firm is neither liable for the proper and complete transmission of
>>>> the
>>>> information contained in this communication nor for any delay in its
>>>> receipt.
>>>>
>>>>
>>>>  .
>>>
>>>
>>>
>>
>> The information contained in this communication is intended solely for the
>> use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you have received this communication in error, please notify us
>> immediately by responding to this email and then delete it from your system.
>> The firm is neither liable for the proper and complete transmission of the
>> information contained in this communication nor for any delay in its
>> receipt.
>>
>
>

Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

Reply via email to