Hi,


I am using pig 0.7.0 in hadoop mapreduce mode.



The problem I have is that I simply can't use



STORE INTO alias USING PigStorage();



I can load dataset in, write UDFs to manipulate the dataset, but I can't
store it. The output is a directory in HDFS with 0 bytes.



As an example, I've been testing with a simple script:



W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, name:chararray,
type:chararray);

store W into 'wordtesting' using PigStorage(' ');



I run the code in grunt, and the output of hadoop fs -ls is:



drwxr-xr-x   - awang supergroup          0 2010-09-21 13:45
/user/awang/wordtesting



The grunt messages are:



grunt> store filteredW into 'wordtesting' using PigStorage(' ');

2010-09-21 13:45:35,210 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns
- No column pruned for W

2010-09-21 13:45:35,210 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns
- No map keys pruned for W

2010-09-21 13:45:35,440 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- (Name: Store(hdfs://pineal:9000/user/awang/wordtesting:PigStorage(' ')) -
1-46 Operator Key: 1-46)

2010-09-21 13:45:35,498 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1

2010-09-21 13:45:35,498 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1

2010-09-21 13:45:35,549 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

2010-09-21 13:45:38,100 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job

2010-09-21 13:45:38,166 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.

2010-09-21 13:45:38,173 [Thread-15] WARN  org.apache.hadoop.mapred.JobClient
- Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.

2010-09-21 13:45:38,307 [Thread-15] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat
- Total input paths to process : 1

2010-09-21 13:45:38,307 [Thread-15] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil
- Total input paths to process : 1

2010-09-21 13:45:38,670 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201009211320_0002

2010-09-21 13:45:38,670 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://pineal:50030/jobdetails.jsp?jobid=job_201009211320_0002

2010-09-21 13:45:38,673 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete

2010-09-21 13:45:48,755 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete

2010-09-21 13:45:53,835 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete

2010-09-21 13:45:53,835 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Successfully stored result in: "hdfs://pineal:9000/user/awang/wordtesting"

2010-09-21 13:45:53,846 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written : 1

2010-09-21 13:45:53,846 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written : 20

2010-09-21 13:45:53,846 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Spillable Memory Manager spill count : 0

2010-09-21 13:45:53,847 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Proactive spill count : 0

2010-09-21 13:45:53,847 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!





I've been struggling with this for a long timeā€¦. It works if I have a one
bytearray in my tuple, but once I defined my schema, it  no longer works.



Anyone has any idea? Please help!! Thanks!



Best regards,

Alex

Reply via email to