Hello, After getting all the errors to go away with LZO libraries not being found and missing jar files for elephant-bird I've run into a new problem when using the elephant-bird branch for pig 0.7
The following simple pig script works as expected REGISTER elephant-bird-1.0.jar REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar A = load '/usr/foo/input/test_input_chars.txt'; DUMP A; This just dumps out the contents of the test_input_chars.txt file which is tab delimited. The output looks like: (1,a,a,a,a,a,a) (2,b,b,b,b,b,b) (3,c,c,c,c,c,c) (4,d,d,d,d,d,d) (5,e,e,e,e,e,e) I then lzop the test file to get test_input_chars.txt.lzo (I decompressed this with lzop -d to make sure the compression worked fine and everything looks good). If I run the exact same script provided above on the lzo file it works fine. However, this file is really small and doesn't need to use indexes. As a result, I wanted to have LZO support that worked with indexes. Based on this I decided to try out the elephant-bird branch for pig 0.7 located here ( http://github.com/hirohanin/elephant-bird/) as recommended by Dimitriy. I created the following pig script that mirrors the above script but should hopefully work on LZO files (including indexed ones) REGISTER elephant-bird-1.0.jar REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar A = load '/usr/foo/input/test_input_chars.txt.lzo' USING com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); DUMP A; When I run this script which uses the LzoTokenizedLoader there is no output. The script appears to run without errors but there are zero Records Written and 0 Bytes Written. Here is the exact output: grunt > DUMP A; [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - LzoTokenizedLoader with given delimited [ ] [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - LzoTokenizedLoader with given delimited [ ] [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - LzoTokenizedLoader with given delimited [ ] [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage) - 1-4 Operator Key: 1-4 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - LzoTokenizedLoader with given delimiter [ ] [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201009101108_0151 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Succesfully stored result in "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Records written: 0 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Bytes written: 0 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Spillable Memory Manager spill count : 0 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Proactive spill count : 0 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process: 1 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process: 1 grunt > I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader or if there is a problem with the class itself (most likely the problem is with my code heh) Thank you for any help! ~Ed