The 0.7 branch is not tested.. it's quite likely it doesn't actually work :). Rohan Rai was working on it.. Rohan, think you can take a look and help Ed out?
Ed, you may want to check if the same input works when you use Pig 0.6 (and the official elephant-bird, on Kevin Weil's github). -D On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> wrote: > Hello, > > After getting all the errors to go away with LZO libraries not being found > and missing jar files for elephant-bird I've run into a new problem when > using the elephant-bird branch for pig 0.7 > > The following simple pig script works as expected > > REGISTER elephant-bird-1.0.jar > REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar > A = load '/usr/foo/input/test_input_chars.txt'; > DUMP A; > > This just dumps out the contents of the test_input_chars.txt file which is > tab delimited. The output looks like: > > (1,a,a,a,a,a,a) > (2,b,b,b,b,b,b) > (3,c,c,c,c,c,c) > (4,d,d,d,d,d,d) > (5,e,e,e,e,e,e) > > I then lzop the test file to get test_input_chars.txt.lzo (I decompressed > this with lzop -d to make sure the compression worked fine and everything > looks good). > If I run the exact same script provided above on the lzo file it works > fine. However, this file is really small and doesn't need to use indexes. > As a result, I wanted to > have LZO support that worked with indexes. Based on this I decided to try > out the elephant-bird branch for pig 0.7 located here ( > http://github.com/hirohanin/elephant-bird/) as > recommended by Dimitriy. > > I created the following pig script that mirrors the above script but should > hopefully work on LZO files (including indexed ones) > > REGISTER elephant-bird-1.0.jar > REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar > A = load '/usr/foo/input/test_input_chars.txt.lzo' USING > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); > DUMP A; > > When I run this script which uses the LzoTokenizedLoader there is no > output. The script appears to run without errors but there are zero > Records > Written and 0 Bytes Written. > > Here is the exact output: > > grunt > DUMP A; > [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - > LzoTokenizedLoader with given delimited [ ] > [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - > LzoTokenizedLoader with given delimited [ ] > [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - > LzoTokenizedLoader with given delimited [ ] > [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine > - > (Name: > > Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage) > - 1-4 Operator Key: 1-4 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 1 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up single store job > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map-reduce job(s) waiting for submission. > [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use > GenericOptionsParser for parsing the arguments. Applications should > implement Tool for the same. > [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - > LzoTokenizedLoader with given delimiter [ ] > [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - > Total input paths to process : 1 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - HadoopJobId: job_201009101108_0151 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - More information at > http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 50% complete > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Succesfully stored result in > "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Records written: 0 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Bytes written: 0 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Spillable Memory Manager spill count : 0 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Proactive spill count : 0 > [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Success! > [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total > input paths to process: 1 > [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - > Total input paths to process: 1 > grunt > > > I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader > or > if there is a problem with the class itself (most likely the problem is > with > my code heh) Thank you for any help! > > ~Ed >