Thank you Rohan,  I really appreciate your help!  I'll give it shot and post
back if it works.

~Ed

On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan....@inmobi.com> wrote:

> Just corrected/tested and pushed LzoTokenizedLoader to the personal fork
>
> Hopefully it works now
>
>
> Regards
> Rohan
>
> Dmitriy Ryaboy wrote:
>
>> lzop should work.
>>
>> On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan....@inmobi.com> wrote:
>>
>>
>>  Well
>>>
>>> I haven't tried (rather I don't remember) compressing via lzop and then
>>> putting on cluster...
>>> So cant tell you about that...Here is what works for me.
>>>
>>> I do it by first putting the file on cluster and then doing Stream
>>> Compression.
>>>
>>> And yes it need not be indexed (I guess it doesn't matter for  small
>>> test file, otherwise it is unwise
>>> for one loses the benefit of parallelism)
>>>
>>> Regards
>>> Rohan
>>>
>>>
>>> pig wrote:
>>>
>>>
>>>  Hi Rohan,
>>>>
>>>> The test file (test_input_chars.txt.lzo) is not indexed.  I created it
>>>> using
>>>> the command
>>>>
>>>> 'lzop test_input_chars.txt'
>>>>
>>>> It's a really small file (only 6 lines) so I didn't think it needed to
>>>> be
>>>> index.  Do all files regardless of size need to be indexed for the
>>>> LzoTokenizedLoader to work?
>>>>
>>>> Thank you!
>>>>
>>>> ~Ed
>>>>
>>>> On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan....@inmobi.com>
>>>> wrote:
>>>>
>>>>
>>>>  Oh  Sorry I am completely out of  sync...
>>>>
>>>>  Can you tell how did you lzo'ed and indexed  the file
>>>>>
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>> Rohan Rai wrote:
>>>>>
>>>>>
>>>>>  Oh Sorry I did not see this mail ...
>>>>>
>>>>>  Its not an official patch/release
>>>>>>
>>>>>> But here is a fork on elephant-bird which works with pig 0.7
>>>>>>
>>>>>> for  normal LZOText Loading etc
>>>>>>
>>>>>> (NOt HbaseLoader)
>>>>>>
>>>>>> Regards
>>>>>> Rohan
>>>>>>
>>>>>> Dmitriy Ryaboy wrote:
>>>>>>
>>>>>>  The 0.7 branch is not tested.. it's quite likely it doesn't actually
>>>>>> work
>>>>>>
>>>>>>  :).
>>>>>>
>>>>>>  Rohan Rai was working on it.. Rohan, think you can take a look and
>>>>>>> help
>>>>>>> Ed
>>>>>>> out?
>>>>>>>
>>>>>>> Ed, you may want to check if the same input works when you use Pig
>>>>>>> 0.6
>>>>>>> (and
>>>>>>> the official elephant-bird, on Kevin Weil's github).
>>>>>>>
>>>>>>> -D
>>>>>>>
>>>>>>> On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Hello,
>>>>>>>
>>>>>>>  After getting all the errors to go away with LZO libraries not being
>>>>>>>
>>>>>>>  found
>>>>>>>> and missing jar files for elephant-bird I've run into a new problem
>>>>>>>> when
>>>>>>>> using the elephant-bird branch for pig 0.7
>>>>>>>>
>>>>>>>> The following simple pig script works as expected
>>>>>>>>
>>>>>>>>  REGISTER elephant-bird-1.0.jar
>>>>>>>>  REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
>>>>>>>>  A = load '/usr/foo/input/test_input_chars.txt';
>>>>>>>>  DUMP A;
>>>>>>>>
>>>>>>>> This just dumps out the contents of the test_input_chars.txt file
>>>>>>>> which
>>>>>>>> is
>>>>>>>> tab delimited. The output looks like:
>>>>>>>>
>>>>>>>>  (1,a,a,a,a,a,a)
>>>>>>>>  (2,b,b,b,b,b,b)
>>>>>>>>  (3,c,c,c,c,c,c)
>>>>>>>>  (4,d,d,d,d,d,d)
>>>>>>>>  (5,e,e,e,e,e,e)
>>>>>>>>
>>>>>>>> I then lzop the test file to get test_input_chars.txt.lzo (I
>>>>>>>> decompressed
>>>>>>>> this with lzop -d to make sure the compression worked fine and
>>>>>>>> everything
>>>>>>>> looks good).
>>>>>>>> If I run the exact same script provided above on the lzo file it
>>>>>>>> works
>>>>>>>> fine.  However, this file is really small and doesn't need to use
>>>>>>>> indexes.
>>>>>>>> As a result, I wanted to
>>>>>>>> have LZO support that worked with indexes.  Based on this I decided
>>>>>>>> to
>>>>>>>> try
>>>>>>>> out the elephant-bird branch for pig 0.7 located here (
>>>>>>>> http://github.com/hirohanin/elephant-bird/) as
>>>>>>>> recommended by Dimitriy.
>>>>>>>>
>>>>>>>> I created the following pig script that mirrors the above script but
>>>>>>>> should
>>>>>>>> hopefully work on LZO files (including indexed ones)
>>>>>>>>
>>>>>>>>  REGISTER elephant-bird-1.0.jar
>>>>>>>>  REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
>>>>>>>>  A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
>>>>>>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
>>>>>>>>  DUMP A;
>>>>>>>>
>>>>>>>> When I run this script which uses the LzoTokenizedLoader there is no
>>>>>>>> output.  The script appears to run without errors but there are zero
>>>>>>>> Records
>>>>>>>> Written and 0 Bytes Written.
>>>>>>>>
>>>>>>>> Here is the exact output:
>>>>>>>>
>>>>>>>> grunt > DUMP A;
>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>>>>>> LzoTokenizedLoader with given delimited [     ]
>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>>>>>> LzoTokenizedLoader with given delimited [     ]
>>>>>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>>>>>> LzoTokenizedLoader with given delimited [     ]
>>>>>>>> [main] INFO
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
>>>>>>>> -
>>>>>>>> (Name:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
>>>>>>>> - 1-4 Operator Key: 1-4
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>>>>> - MR plan size before optimization: 1
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>>>>> - MR plan size after optimization: 1
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>>>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to
>>>>>>>> default
>>>>>>>> 0.3
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>>>>> - Setting up single store job
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - 1 map-reduce job(s) waiting for submission.
>>>>>>>> [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
>>>>>>>> GenericOptionsParser for parsing the arguments.  Applications should
>>>>>>>> implement Tool for the same.
>>>>>>>> [Thread-12] INFO
>>>>>>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader
>>>>>>>> -
>>>>>>>> LzoTokenizedLoader with given delimiter [     ]
>>>>>>>> [Thread-12] INFO
>>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat
>>>>>>>> -
>>>>>>>> Total input paths to process : 1
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - 0% complete
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - HadoopJobId: job_201009101108_0151
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - More information at
>>>>>>>> http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - 50% complete
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - 100% complete
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - Succesfully stored result in
>>>>>>>> "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - Records written: 0
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - Bytes written: 0
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - Spillable Memory Manager spill count : 0
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - Proactive spill count : 0
>>>>>>>> [main] INFO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>> - Success!
>>>>>>>> [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
>>>>>>>> Total
>>>>>>>> input paths to process: 1
>>>>>>>> [main] INFO
>>>>>>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
>>>>>>>> Total input paths to process: 1
>>>>>>>> grunt >
>>>>>>>>
>>>>>>>> I'm not sure if I'm doing something wrong in my use of
>>>>>>>> LzoTokenizedLoader
>>>>>>>> or
>>>>>>>> if there is a problem with the class itself (most likely the problem
>>>>>>>> is
>>>>>>>> with
>>>>>>>> my code heh)  Thank you for any help!
>>>>>>>>
>>>>>>>> ~Ed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  .
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   The information contained in this communication is intended solely
>>>>>>> for
>>>>>>>
>>>>>>>  the
>>>>>> use of the individual or entity to whom it is addressed and others
>>>>>> authorized to receive it. It may contain confidential or legally
>>>>>> privileged
>>>>>> information. If you are not the intended recipient you are hereby
>>>>>> notified
>>>>>> that any disclosure, copying, distribution or taking any action in
>>>>>> reliance
>>>>>> on the contents of this information is strictly prohibited and may be
>>>>>> unlawful. If you have received this communication in error, please
>>>>>> notify us
>>>>>> immediately by responding to this email and then delete it from your
>>>>>> system.
>>>>>> The firm is neither liable for the proper and complete transmission of
>>>>>> the
>>>>>> information contained in this communication nor for any delay in its
>>>>>> receipt.
>>>>>> .
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  The information contained in this communication is intended solely
>>>>>> for
>>>>>>
>>>>>>  the
>>>>> use of the individual or entity to whom it is addressed and others
>>>>> authorized to receive it. It may contain confidential or legally
>>>>> privileged
>>>>> information. If you are not the intended recipient you are hereby
>>>>> notified
>>>>> that any disclosure, copying, distribution or taking any action in
>>>>> reliance
>>>>> on the contents of this information is strictly prohibited and may be
>>>>> unlawful. If you have received this communication in error, please
>>>>> notify
>>>>> us
>>>>> immediately by responding to this email and then delete it from your
>>>>> system.
>>>>> The firm is neither liable for the proper and complete transmission of
>>>>> the
>>>>> information contained in this communication nor for any delay in its
>>>>> receipt.
>>>>>
>>>>>
>>>>>  .
>>>>>
>>>>>
>>>>
>>>>  The information contained in this communication is intended solely for
>>> the
>>> use of the individual or entity to whom it is addressed and others
>>> authorized to receive it. It may contain confidential or legally
>>> privileged
>>> information. If you are not the intended recipient you are hereby
>>> notified
>>> that any disclosure, copying, distribution or taking any action in
>>> reliance
>>> on the contents of this information is strictly prohibited and may be
>>> unlawful. If you have received this communication in error, please notify
>>> us
>>> immediately by responding to this email and then delete it from your
>>> system.
>>> The firm is neither liable for the proper and complete transmission of
>>> the
>>> information contained in this communication nor for any delay in its
>>> receipt.
>>>
>>>
>>>  .
>>
>>
>>
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>

Reply via email to