Hello, I'm using Cloudera's Hadoop CDH3B2--Hadoop-0.20.2+320 (based on Apache Hadoop 20.2) with Pig 0.7 (from Cloudera's distro).
Thank you! ~Ed On Wed, Sep 29, 2010 at 11:56 PM, Rohan Rai <rohan....@inmobi.com> wrote: > Hi > > Which Hadoop/ PIg version are you using ?? > > Regards > Rohan > > > ed wrote: > >> Hello, >> >> I tested the newest push to the hirohanin elephant-bird branch (for pig >> 0.7) >> and had an error when trying to use LzoTokenizedLoader with the following >> pig script: >> >> REGISTER elephant-bird-1.0.jar >> REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar >> A = load '/usr/foo/input/test_input_chars.txt.lzo' USING >> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); >> DUMP A; >> >> The error I get is in the mapper logs and is as follows: >> >> INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl >> library >> INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded & initialized >> native-lzo library >> INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader: >> LzoTokenizedLoader with given delimiter [ ] >> INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking to >> split start at pos 0 >> FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : >> java.lang.NoSuchMethodError: >> >> org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext; >> at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC (Unknown >> Source) >> at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter >> (Unknown Source) >> at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter >> (Unknown Source) >> at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext >> (Unknown Source) >> at >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue >> (PigRecordReader.java:142) >> at >> >> org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423) >> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue >> (MapContent.java:67) >> at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> Do you think I'm forgetting some required library? >> >> Thank you! >> >> ~Ed >> >> On Tue, Sep 28, 2010 at 2:10 PM, ed <hadoopn...@gmail.com> wrote: >> >> >> Thank you Rohan, I really appreciate your help! I'll give it shot and >>> post back if it works. >>> >>> ~Ed >>> >>> >>> On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan....@inmobi.com> >>> wrote: >>> >>> >>> Just corrected/tested and pushed LzoTokenizedLoader to the personal fork >>>> >>>> Hopefully it works now >>>> >>>> >>>> Regards >>>> Rohan >>>> >>>> Dmitriy Ryaboy wrote: >>>> >>>> >>>> lzop should work. >>>>> >>>>> On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan....@inmobi.com> >>>>> wrote: >>>>> >>>>> >>>>> Well >>>>> >>>>> I haven't tried (rather I don't remember) compressing via lzop and >>>>>> then >>>>>> putting on cluster... >>>>>> So cant tell you about that...Here is what works for me. >>>>>> >>>>>> I do it by first putting the file on cluster and then doing Stream >>>>>> Compression. >>>>>> >>>>>> And yes it need not be indexed (I guess it doesn't matter for small >>>>>> test file, otherwise it is unwise >>>>>> for one loses the benefit of parallelism) >>>>>> >>>>>> Regards >>>>>> Rohan >>>>>> >>>>>> >>>>>> pig wrote: >>>>>> >>>>>> >>>>>> Hi Rohan, >>>>>> >>>>>> The test file (test_input_chars.txt.lzo) is not indexed. I created >>>>>>> it >>>>>>> using >>>>>>> the command >>>>>>> >>>>>>> 'lzop test_input_chars.txt' >>>>>>> >>>>>>> It's a really small file (only 6 lines) so I didn't think it needed >>>>>>> to >>>>>>> be >>>>>>> index. Do all files regardless of size need to be indexed for the >>>>>>> LzoTokenizedLoader to work? >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> ~Ed >>>>>>> >>>>>>> On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan....@inmobi.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Oh Sorry I am completely out of sync... >>>>>>> >>>>>>> Can you tell how did you lzo'ed and indexed the file >>>>>>> >>>>>>> Regards >>>>>>>> Rohan >>>>>>>> >>>>>>>> Rohan Rai wrote: >>>>>>>> >>>>>>>> >>>>>>>> Oh Sorry I did not see this mail ... >>>>>>>> >>>>>>>> Its not an official patch/release >>>>>>>> >>>>>>>> But here is a fork on elephant-bird which works with pig 0.7 >>>>>>>>> >>>>>>>>> for normal LZOText Loading etc >>>>>>>>> >>>>>>>>> (NOt HbaseLoader) >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Rohan >>>>>>>>> >>>>>>>>> Dmitriy Ryaboy wrote: >>>>>>>>> >>>>>>>>> The 0.7 branch is not tested.. it's quite likely it doesn't >>>>>>>>> actually >>>>>>>>> work >>>>>>>>> >>>>>>>>> :). >>>>>>>>> >>>>>>>>> Rohan Rai was working on it.. Rohan, think you can take a look and >>>>>>>>> >>>>>>>>> help >>>>>>>>>> Ed >>>>>>>>>> out? >>>>>>>>>> >>>>>>>>>> Ed, you may want to check if the same input works when you use Pig >>>>>>>>>> 0.6 >>>>>>>>>> (and >>>>>>>>>> the official elephant-bird, on Kevin Weil's github). >>>>>>>>>> >>>>>>>>>> -D >>>>>>>>>> >>>>>>>>>> On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> After getting all the errors to go away with LZO libraries not >>>>>>>>>> being >>>>>>>>>> >>>>>>>>>> found >>>>>>>>>> >>>>>>>>>> and missing jar files for elephant-bird I've run into a new >>>>>>>>>>> problem >>>>>>>>>>> when >>>>>>>>>>> using the elephant-bird branch for pig 0.7 >>>>>>>>>>> >>>>>>>>>>> The following simple pig script works as expected >>>>>>>>>>> >>>>>>>>>>> REGISTER elephant-bird-1.0.jar >>>>>>>>>>> REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar >>>>>>>>>>> A = load '/usr/foo/input/test_input_chars.txt'; >>>>>>>>>>> DUMP A; >>>>>>>>>>> >>>>>>>>>>> This just dumps out the contents of the test_input_chars.txt file >>>>>>>>>>> which >>>>>>>>>>> is >>>>>>>>>>> tab delimited. The output looks like: >>>>>>>>>>> >>>>>>>>>>> (1,a,a,a,a,a,a) >>>>>>>>>>> (2,b,b,b,b,b,b) >>>>>>>>>>> (3,c,c,c,c,c,c) >>>>>>>>>>> (4,d,d,d,d,d,d) >>>>>>>>>>> (5,e,e,e,e,e,e) >>>>>>>>>>> >>>>>>>>>>> I then lzop the test file to get test_input_chars.txt.lzo (I >>>>>>>>>>> decompressed >>>>>>>>>>> this with lzop -d to make sure the compression worked fine and >>>>>>>>>>> everything >>>>>>>>>>> looks good). >>>>>>>>>>> If I run the exact same script provided above on the lzo file it >>>>>>>>>>> works >>>>>>>>>>> fine. However, this file is really small and doesn't need to use >>>>>>>>>>> indexes. >>>>>>>>>>> As a result, I wanted to >>>>>>>>>>> have LZO support that worked with indexes. Based on this I >>>>>>>>>>> decided >>>>>>>>>>> to >>>>>>>>>>> try >>>>>>>>>>> out the elephant-bird branch for pig 0.7 located here ( >>>>>>>>>>> http://github.com/hirohanin/elephant-bird/) as >>>>>>>>>>> recommended by Dimitriy. >>>>>>>>>>> >>>>>>>>>>> I created the following pig script that mirrors the above script >>>>>>>>>>> but >>>>>>>>>>> should >>>>>>>>>>> hopefully work on LZO files (including indexed ones) >>>>>>>>>>> >>>>>>>>>>> REGISTER elephant-bird-1.0.jar >>>>>>>>>>> REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar >>>>>>>>>>> A = load '/usr/foo/input/test_input_chars.txt.lzo' USING >>>>>>>>>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); >>>>>>>>>>> DUMP A; >>>>>>>>>>> >>>>>>>>>>> When I run this script which uses the LzoTokenizedLoader there is >>>>>>>>>>> no >>>>>>>>>>> output. The script appears to run without errors but there are >>>>>>>>>>> zero >>>>>>>>>>> Records >>>>>>>>>>> Written and 0 Bytes Written. >>>>>>>>>>> >>>>>>>>>>> Here is the exact output: >>>>>>>>>>> >>>>>>>>>>> grunt > DUMP A; >>>>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader >>>>>>>>>>> - >>>>>>>>>>> LzoTokenizedLoader with given delimited [ ] >>>>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader >>>>>>>>>>> - >>>>>>>>>>> LzoTokenizedLoader with given delimited [ ] >>>>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader >>>>>>>>>>> - >>>>>>>>>>> LzoTokenizedLoader with given delimited [ ] >>>>>>>>>>> [main] INFO >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine >>>>>>>>>>> - >>>>>>>>>>> (Name: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage) >>>>>>>>>>> - 1-4 Operator Key: 1-4 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>>>>>>>> - MR plan size before optimization: 1 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>>>>>>>> - MR plan size after optimization: 1 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>>>>>>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to >>>>>>>>>>> default >>>>>>>>>>> 0.3 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>>>>>>>> - Setting up single store job >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - 1 map-reduce job(s) waiting for submission. >>>>>>>>>>> [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use >>>>>>>>>>> GenericOptionsParser for parsing the arguments. Applications >>>>>>>>>>> should >>>>>>>>>>> implement Tool for the same. >>>>>>>>>>> [Thread-12] INFO >>>>>>>>>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader >>>>>>>>>>> - >>>>>>>>>>> LzoTokenizedLoader with given delimiter [ ] >>>>>>>>>>> [Thread-12] INFO >>>>>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat >>>>>>>>>>> - >>>>>>>>>>> Total input paths to process : 1 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - 0% complete >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - HadoopJobId: job_201009101108_0151 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - More information at >>>>>>>>>>> http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - 50% complete >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - 100% complete >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - Succesfully stored result in >>>>>>>>>>> "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - Records written: 0 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - Bytes written: 0 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - Spillable Memory Manager spill count : 0 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - Proactive spill count : 0 >>>>>>>>>>> [main] INFO >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>> - Success! >>>>>>>>>>> [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat >>>>>>>>>>> - >>>>>>>>>>> Total >>>>>>>>>>> input paths to process: 1 >>>>>>>>>>> [main] INFO >>>>>>>>>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - >>>>>>>>>>> Total input paths to process: 1 >>>>>>>>>>> grunt > >>>>>>>>>>> >>>>>>>>>>> I'm not sure if I'm doing something wrong in my use of >>>>>>>>>>> LzoTokenizedLoader >>>>>>>>>>> or >>>>>>>>>>> if there is a problem with the class itself (most likely the >>>>>>>>>>> problem >>>>>>>>>>> is >>>>>>>>>>> with >>>>>>>>>>> my code heh) Thank you for any help! >>>>>>>>>>> >>>>>>>>>>> ~Ed >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The information contained in this communication is intended >>>>>>>>>>> >>>>>>>>>>> solely for >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> use of the individual or entity to whom it is addressed and >>>>>>>>> others >>>>>>>>> authorized to receive it. It may contain confidential or legally >>>>>>>>> privileged >>>>>>>>> information. If you are not the intended recipient you are hereby >>>>>>>>> notified >>>>>>>>> that any disclosure, copying, distribution or taking any action in >>>>>>>>> reliance >>>>>>>>> on the contents of this information is strictly prohibited and may >>>>>>>>> be >>>>>>>>> unlawful. If you have received this communication in error, please >>>>>>>>> notify us >>>>>>>>> immediately by responding to this email and then delete it from >>>>>>>>> your >>>>>>>>> system. >>>>>>>>> The firm is neither liable for the proper and complete transmission >>>>>>>>> of >>>>>>>>> the >>>>>>>>> information contained in this communication nor for any delay in >>>>>>>>> its >>>>>>>>> receipt. >>>>>>>>> . >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The information contained in this communication is intended solely >>>>>>>>> for >>>>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>> use of the individual or entity to whom it is addressed and others >>>>>>>> authorized to receive it. It may contain confidential or legally >>>>>>>> privileged >>>>>>>> information. If you are not the intended recipient you are hereby >>>>>>>> notified >>>>>>>> that any disclosure, copying, distribution or taking any action in >>>>>>>> reliance >>>>>>>> on the contents of this information is strictly prohibited and may >>>>>>>> be >>>>>>>> unlawful. If you have received this communication in error, please >>>>>>>> notify >>>>>>>> us >>>>>>>> immediately by responding to this email and then delete it from your >>>>>>>> system. >>>>>>>> The firm is neither liable for the proper and complete transmission >>>>>>>> of >>>>>>>> the >>>>>>>> information contained in this communication nor for any delay in its >>>>>>>> receipt. >>>>>>>> >>>>>>>> >>>>>>>> . >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The information contained in this communication is intended solely >>>>>>> for >>>>>>> >>>>>>> the >>>>>> use of the individual or entity to whom it is addressed and others >>>>>> authorized to receive it. It may contain confidential or legally >>>>>> privileged >>>>>> information. If you are not the intended recipient you are hereby >>>>>> notified >>>>>> that any disclosure, copying, distribution or taking any action in >>>>>> reliance >>>>>> on the contents of this information is strictly prohibited and may be >>>>>> unlawful. If you have received this communication in error, please >>>>>> notify us >>>>>> immediately by responding to this email and then delete it from your >>>>>> system. >>>>>> The firm is neither liable for the proper and complete transmission of >>>>>> the >>>>>> information contained in this communication nor for any delay in its >>>>>> receipt. >>>>>> >>>>>> >>>>>> . >>>>>> >>>>>> >>>>> >>>>> The information contained in this communication is intended solely for >>>> the >>>> use of the individual or entity to whom it is addressed and others >>>> authorized to receive it. It may contain confidential or legally >>>> privileged >>>> information. If you are not the intended recipient you are hereby >>>> notified >>>> that any disclosure, copying, distribution or taking any action in >>>> reliance >>>> on the contents of this information is strictly prohibited and may be >>>> unlawful. If you have received this communication in error, please >>>> notify us >>>> immediately by responding to this email and then delete it from your >>>> system. >>>> The firm is neither liable for the proper and complete transmission of >>>> the >>>> information contained in this communication nor for any delay in its >>>> receipt. >>>> >>>> >>>> >>> . >> >> >> > > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify us > immediately by responding to this email and then delete it from your system. > The firm is neither liable for the proper and complete transmission of the > information contained in this communication nor for any delay in its > receipt. >