Hello, I tested the newest push to the hirohanin elephant-bird branch (for pig 0.7) and had an error when trying to use LzoTokenizedLoader with the following pig script:
REGISTER elephant-bird-1.0.jar REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar A = load '/usr/foo/input/test_input_chars.txt.lzo' USING com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); DUMP A; The error I get is in the mapper logs and is as follows: INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl library INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded & initialized native-lzo library INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader: LzoTokenizedLoader with given delimiter [ ] INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking to split start at pos 0 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.NoSuchMethodError: org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext; at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC (Unknown Source) at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter (Unknown Source) at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter (Unknown Source) at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext (Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue (PigRecordReader.java:142) at org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue (MapContent.java:67) at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Do you think I'm forgetting some required library? Thank you! ~Ed On Tue, Sep 28, 2010 at 2:10 PM, ed <hadoopn...@gmail.com> wrote: > Thank you Rohan, I really appreciate your help! I'll give it shot and > post back if it works. > > ~Ed > > > On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan....@inmobi.com> wrote: > >> Just corrected/tested and pushed LzoTokenizedLoader to the personal fork >> >> Hopefully it works now >> >> >> Regards >> Rohan >> >> Dmitriy Ryaboy wrote: >> >>> lzop should work. >>> >>> On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan....@inmobi.com> >>> wrote: >>> >>> >>> Well >>>> >>>> I haven't tried (rather I don't remember) compressing via lzop and then >>>> putting on cluster... >>>> So cant tell you about that...Here is what works for me. >>>> >>>> I do it by first putting the file on cluster and then doing Stream >>>> Compression. >>>> >>>> And yes it need not be indexed (I guess it doesn't matter for small >>>> test file, otherwise it is unwise >>>> for one loses the benefit of parallelism) >>>> >>>> Regards >>>> Rohan >>>> >>>> >>>> pig wrote: >>>> >>>> >>>> Hi Rohan, >>>>> >>>>> The test file (test_input_chars.txt.lzo) is not indexed. I created it >>>>> using >>>>> the command >>>>> >>>>> 'lzop test_input_chars.txt' >>>>> >>>>> It's a really small file (only 6 lines) so I didn't think it needed to >>>>> be >>>>> index. Do all files regardless of size need to be indexed for the >>>>> LzoTokenizedLoader to work? >>>>> >>>>> Thank you! >>>>> >>>>> ~Ed >>>>> >>>>> On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan....@inmobi.com> >>>>> wrote: >>>>> >>>>> >>>>> Oh Sorry I am completely out of sync... >>>>> >>>>> Can you tell how did you lzo'ed and indexed the file >>>>>> >>>>>> >>>>>> Regards >>>>>> Rohan >>>>>> >>>>>> Rohan Rai wrote: >>>>>> >>>>>> >>>>>> Oh Sorry I did not see this mail ... >>>>>> >>>>>> Its not an official patch/release >>>>>>> >>>>>>> But here is a fork on elephant-bird which works with pig 0.7 >>>>>>> >>>>>>> for normal LZOText Loading etc >>>>>>> >>>>>>> (NOt HbaseLoader) >>>>>>> >>>>>>> Regards >>>>>>> Rohan >>>>>>> >>>>>>> Dmitriy Ryaboy wrote: >>>>>>> >>>>>>> The 0.7 branch is not tested.. it's quite likely it doesn't actually >>>>>>> work >>>>>>> >>>>>>> :). >>>>>>> >>>>>>> Rohan Rai was working on it.. Rohan, think you can take a look and >>>>>>>> help >>>>>>>> Ed >>>>>>>> out? >>>>>>>> >>>>>>>> Ed, you may want to check if the same input works when you use Pig >>>>>>>> 0.6 >>>>>>>> (and >>>>>>>> the official elephant-bird, on Kevin Weil's github). >>>>>>>> >>>>>>>> -D >>>>>>>> >>>>>>>> On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> After getting all the errors to go away with LZO libraries not >>>>>>>> being >>>>>>>> >>>>>>>> found >>>>>>>>> and missing jar files for elephant-bird I've run into a new problem >>>>>>>>> when >>>>>>>>> using the elephant-bird branch for pig 0.7 >>>>>>>>> >>>>>>>>> The following simple pig script works as expected >>>>>>>>> >>>>>>>>> REGISTER elephant-bird-1.0.jar >>>>>>>>> REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar >>>>>>>>> A = load '/usr/foo/input/test_input_chars.txt'; >>>>>>>>> DUMP A; >>>>>>>>> >>>>>>>>> This just dumps out the contents of the test_input_chars.txt file >>>>>>>>> which >>>>>>>>> is >>>>>>>>> tab delimited. The output looks like: >>>>>>>>> >>>>>>>>> (1,a,a,a,a,a,a) >>>>>>>>> (2,b,b,b,b,b,b) >>>>>>>>> (3,c,c,c,c,c,c) >>>>>>>>> (4,d,d,d,d,d,d) >>>>>>>>> (5,e,e,e,e,e,e) >>>>>>>>> >>>>>>>>> I then lzop the test file to get test_input_chars.txt.lzo (I >>>>>>>>> decompressed >>>>>>>>> this with lzop -d to make sure the compression worked fine and >>>>>>>>> everything >>>>>>>>> looks good). >>>>>>>>> If I run the exact same script provided above on the lzo file it >>>>>>>>> works >>>>>>>>> fine. However, this file is really small and doesn't need to use >>>>>>>>> indexes. >>>>>>>>> As a result, I wanted to >>>>>>>>> have LZO support that worked with indexes. Based on this I decided >>>>>>>>> to >>>>>>>>> try >>>>>>>>> out the elephant-bird branch for pig 0.7 located here ( >>>>>>>>> http://github.com/hirohanin/elephant-bird/) as >>>>>>>>> recommended by Dimitriy. >>>>>>>>> >>>>>>>>> I created the following pig script that mirrors the above script >>>>>>>>> but >>>>>>>>> should >>>>>>>>> hopefully work on LZO files (including indexed ones) >>>>>>>>> >>>>>>>>> REGISTER elephant-bird-1.0.jar >>>>>>>>> REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar >>>>>>>>> A = load '/usr/foo/input/test_input_chars.txt.lzo' USING >>>>>>>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); >>>>>>>>> DUMP A; >>>>>>>>> >>>>>>>>> When I run this script which uses the LzoTokenizedLoader there is >>>>>>>>> no >>>>>>>>> output. The script appears to run without errors but there are >>>>>>>>> zero >>>>>>>>> Records >>>>>>>>> Written and 0 Bytes Written. >>>>>>>>> >>>>>>>>> Here is the exact output: >>>>>>>>> >>>>>>>>> grunt > DUMP A; >>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - >>>>>>>>> LzoTokenizedLoader with given delimited [ ] >>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - >>>>>>>>> LzoTokenizedLoader with given delimited [ ] >>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - >>>>>>>>> LzoTokenizedLoader with given delimited [ ] >>>>>>>>> [main] INFO >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine >>>>>>>>> - >>>>>>>>> (Name: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage) >>>>>>>>> - 1-4 Operator Key: 1-4 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>>>>>> - MR plan size before optimization: 1 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>>>>>> - MR plan size after optimization: 1 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>>>>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to >>>>>>>>> default >>>>>>>>> 0.3 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>>>>>> - Setting up single store job >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - 1 map-reduce job(s) waiting for submission. >>>>>>>>> [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use >>>>>>>>> GenericOptionsParser for parsing the arguments. Applications >>>>>>>>> should >>>>>>>>> implement Tool for the same. >>>>>>>>> [Thread-12] INFO >>>>>>>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader >>>>>>>>> - >>>>>>>>> LzoTokenizedLoader with given delimiter [ ] >>>>>>>>> [Thread-12] INFO >>>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat >>>>>>>>> - >>>>>>>>> Total input paths to process : 1 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - 0% complete >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - HadoopJobId: job_201009101108_0151 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - More information at >>>>>>>>> http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - 50% complete >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - 100% complete >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - Succesfully stored result in >>>>>>>>> "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - Records written: 0 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - Bytes written: 0 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - Spillable Memory Manager spill count : 0 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - Proactive spill count : 0 >>>>>>>>> [main] INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>> - Success! >>>>>>>>> [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - >>>>>>>>> Total >>>>>>>>> input paths to process: 1 >>>>>>>>> [main] INFO >>>>>>>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - >>>>>>>>> Total input paths to process: 1 >>>>>>>>> grunt > >>>>>>>>> >>>>>>>>> I'm not sure if I'm doing something wrong in my use of >>>>>>>>> LzoTokenizedLoader >>>>>>>>> or >>>>>>>>> if there is a problem with the class itself (most likely the >>>>>>>>> problem >>>>>>>>> is >>>>>>>>> with >>>>>>>>> my code heh) Thank you for any help! >>>>>>>>> >>>>>>>>> ~Ed >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> . >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The information contained in this communication is intended >>>>>>>> solely for >>>>>>>> >>>>>>>> the >>>>>>> use of the individual or entity to whom it is addressed and others >>>>>>> authorized to receive it. It may contain confidential or legally >>>>>>> privileged >>>>>>> information. If you are not the intended recipient you are hereby >>>>>>> notified >>>>>>> that any disclosure, copying, distribution or taking any action in >>>>>>> reliance >>>>>>> on the contents of this information is strictly prohibited and may be >>>>>>> unlawful. If you have received this communication in error, please >>>>>>> notify us >>>>>>> immediately by responding to this email and then delete it from your >>>>>>> system. >>>>>>> The firm is neither liable for the proper and complete transmission >>>>>>> of >>>>>>> the >>>>>>> information contained in this communication nor for any delay in its >>>>>>> receipt. >>>>>>> . >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> The information contained in this communication is intended solely >>>>>>> for >>>>>>> >>>>>>> the >>>>>> use of the individual or entity to whom it is addressed and others >>>>>> authorized to receive it. It may contain confidential or legally >>>>>> privileged >>>>>> information. If you are not the intended recipient you are hereby >>>>>> notified >>>>>> that any disclosure, copying, distribution or taking any action in >>>>>> reliance >>>>>> on the contents of this information is strictly prohibited and may be >>>>>> unlawful. If you have received this communication in error, please >>>>>> notify >>>>>> us >>>>>> immediately by responding to this email and then delete it from your >>>>>> system. >>>>>> The firm is neither liable for the proper and complete transmission of >>>>>> the >>>>>> information contained in this communication nor for any delay in its >>>>>> receipt. >>>>>> >>>>>> >>>>>> . >>>>>> >>>>>> >>>>> >>>>> The information contained in this communication is intended solely for >>>> the >>>> use of the individual or entity to whom it is addressed and others >>>> authorized to receive it. It may contain confidential or legally >>>> privileged >>>> information. If you are not the intended recipient you are hereby >>>> notified >>>> that any disclosure, copying, distribution or taking any action in >>>> reliance >>>> on the contents of this information is strictly prohibited and may be >>>> unlawful. If you have received this communication in error, please >>>> notify us >>>> immediately by responding to this email and then delete it from your >>>> system. >>>> The firm is neither liable for the proper and complete transmission of >>>> the >>>> information contained in this communication nor for any delay in its >>>> receipt. >>>> >>>> >>>> . >>> >>> >>> >> >> The information contained in this communication is intended solely for the >> use of the individual or entity to whom it is addressed and others >> authorized to receive it. It may contain confidential or legally privileged >> information. If you are not the intended recipient you are hereby notified >> that any disclosure, copying, distribution or taking any action in reliance >> on the contents of this information is strictly prohibited and may be >> unlawful. If you have received this communication in error, please notify us >> immediately by responding to this email and then delete it from your system. >> The firm is neither liable for the proper and complete transmission of the >> information contained in this communication nor for any delay in its >> receipt. >> > >