I think I found a bug in versions 0.12.0 through 0.14.0. I've been trying to get passed it all day. Can someone please confirm?
The below is the bare minimum I was able to extract from my original problem to in order to demonstrate the bug. So, don't expect the following code to serve any practical purpose. :) My input file (test_in) is two columns with a tab delimiter: 1 F 2 F My streaming function (sf.py) ignores the actual input and simply generates 2 records: #!/usr/bin/python if __name__ == '__main__': print 'x' print 'y' (But I should mention that in my original problem the input to output was one-to-one. I just ignored the input here to get to the bare minimum effect.) My pig script: MY_INPUT = load 'test_in' as ( f1, f2); split MY_INPUT into T if (f2 == 'T'), F otherwise; T2 = group T by f1; store T2 into 'test_out/T2'; F2 = group F by f1; store F2 into 'test_out/F2'; -- (this line is actually optional to demo the bug) F3 = stream F2 through `sf.py`; store F3 into 'test_out/F3'; My expected output for test/out/F3 is two records that come directly from sf.py: x y However, I only get: x I've tried all of the following to get the expected behavior: - upgraded Pig from 0.12.0 to 0.14.0 - local vs. distributed mode - flush sys.stdout in the streaming function - replace sf.py with sf.sh which is a bash script that used "echo x; echo y" to do the same thing. In this case, the final contents of test_out/F# would vary - sometimes I would get both x and y, and sometimes I would just get x. Aside from removing the one Pig line that I've marked optional, any other attempts to simplify the Pig script or input file causes the bug to not manifest. I've attached the logs from running "pig -x local test.pig" Thanks, Steve
$ pig -x local test.pig 15/04/30 21:47:43 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 15/04/30 21:47:43 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType 2015-04-30 21:47:44,006 [main] INFO org.apache.pig.Main - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05 2015-04-30 21:47:44,013 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/steve/pig_1430430464004.log 2015-04-30 21:47:44,051 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - user.name is deprecated. Instead, use mapreduce.job.user.name 2015-04-30 21:47:45,141 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop/.pigbootup not found 2015-04-30 21:47:45,411 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-30 21:47:45,417 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2015-04-30 21:47:45,420 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2015-04-30 21:47:46,869 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-30 21:47:47,477 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-30 21:47:47,702 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-30 21:47:48,131 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-30 21:47:48,193 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s). 2015-04-30 21:47:48,228 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator 2015-04-30 21:47:48,301 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,STREAMING 2015-04-30 21:47:48,431 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-30 21:47:48,561 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2015-04-30 21:47:48,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2015-04-30 21:47:48,928 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-30 21:47:49,039 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 5 2015-04-30 21:47:49,045 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 map-only splittees. 2015-04-30 21:47:49,049 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 3 MR operators. 2015-04-30 21:47:49,052 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 33 into MR job 30 2015-04-30 21:47:49,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 35 into MR job 30 2015-04-30 21:47:49,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Requested parallelism of splitter: -1 2015-04-30 21:47:49,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 map-reduce splittees. 2015-04-30 21:47:49,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 3 MR operators. 2015-04-30 21:47:49,061 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2015-04-30 21:47:49,156 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-30 21:47:49,206 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id 2015-04-30 21:47:49,212 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2015-04-30 21:47:49,716 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2015-04-30 21:47:49,730 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2015-04-30 21:47:49,730 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2015-04-30 21:47:49,731 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 2015-04-30 21:47:49,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2015-04-30 21:47:49,735 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2015-04-30 21:47:49,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=8 2015-04-30 21:47:49,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2015-04-30 21:47:49,754 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 2015-04-30 21:47:49,876 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job 2015-04-30 21:47:50,164 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2015-04-30 21:47:50,171 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address 2015-04-30 21:47:50,208 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2015-04-30 21:47:50,762 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2015-04-30 21:47:50,906 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2015-04-30 21:47:50,906 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2015-04-30 21:47:50,927 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2015-04-30 21:47:50,989 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2015-04-30 21:47:51,548 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local2069324582_0001 2015-04-30 21:47:51,664 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-hadoop/mapred/staging/hadoop2069324582/.staging/job_local2069324582_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2015-04-30 21:47:51,687 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-hadoop/mapred/staging/hadoop2069324582/.staging/job_local2069324582_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2015-04-30 21:47:52,331 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-hadoop/mapred/local/localRunner/hadoop/job_local2069324582_0001/job_local2069324582_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2015-04-30 21:47:52,350 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-hadoop/mapred/local/localRunner/hadoop/job_local2069324582_0001/job_local2069324582_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2015-04-30 21:47:52,376 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/ 2015-04-30 21:47:52,378 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local2069324582_0001 2015-04-30 21:47:52,378 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases F,F2,F3,MY_INPUT,T,T2 2015-04-30 21:47:52,378 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: MY_INPUT[1,11],T[2,20],T2[3,5],F[2,38],F2[5,5] C: R: F3[7,5] 2015-04-30 21:47:52,403 [Thread-7] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null 2015-04-30 21:47:52,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2015-04-30 21:47:52,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local2069324582_0001] 2015-04-30 21:47:52,505 [Thread-7] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator 2015-04-30 21:47:52,513 [Thread-7] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-30 21:47:52,515 [Thread-7] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2015-04-30 21:47:52,516 [Thread-7] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 2015-04-30 21:47:52,516 [Thread-7] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2015-04-30 21:47:52,546 [Thread-7] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter 2015-04-30 21:47:52,844 [Thread-7] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks 2015-04-30 21:47:52,845 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local2069324582_0001_m_000000_0 2015-04-30 21:47:52,998 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ] 2015-04-30 21:47:53,007 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1 Total Length = 8 Input split[0]: Length = 8 ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit Locations: ----------------------- 2015-04-30 21:47:53,035 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/hadoop/steve/test_in:0+8 2015-04-30 21:47:53,047 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2015-04-30 21:47:53,258 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - (EQUATOR) 0 kvi 26214396(104857584) 2015-04-30 21:47:53,258 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - mapreduce.task.io.sort.mb: 100 2015-04-30 21:47:53,259 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - soft limit at 83886080 2015-04-30 21:47:53,259 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufvoid = 104857600 2015-04-30 21:47:53,259 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396; length = 6553600 2015-04-30 21:47:53,280 [LocalJobRunner Map Task Executor #0] INFO com.hadoop.compression.lzo.GPLNativeCodeLoader - Loaded native gpl library from the embedded binaries 2015-04-30 21:47:53,283 [LocalJobRunner Map Task Executor #0] INFO com.hadoop.compression.lzo.LzoCodec - Successfully loaded & initialized native-lzo library [hadoop-lzo rev 77cfa96225d62546008ca339b7c2076a3da91578] 2015-04-30 21:47:53,324 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2015-04-30 21:47:53,362 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: MY_INPUT[1,11],T[2,20],T2[3,5],F[2,38],F2[5,5] C: R: F3[7,5] 2015-04-30 21:47:54,075 [LocalJobRunner Map Task Executor #0] INFO amazon.emr.metrics.MetricsSaver - MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false 2015-04-30 21:47:54,081 [LocalJobRunner Map Task Executor #0] INFO amazon.emr.metrics.MetricsSaver - Created MetricsSaver j-2027WAZUSMYG8:i-3891d1c5:RunJar:05647 period:60 /mnt/var/em/raw/i-3891d1c5_20150430_null_00000_raw.bin 2015-04-30 21:47:54,103 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - 2015-04-30 21:47:54,104 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output 2015-04-30 21:47:54,104 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Spilling map output 2015-04-30 21:47:54,104 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufend = 24; bufvoid = 104857600 2015-04-30 21:47:54,104 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600 2015-04-30 21:47:54,122 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0 2015-04-30 21:47:54,125 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task:attempt_local2069324582_0001_m_000000_0 is done. And is in the process of committing 2015-04-30 21:47:54,175 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map 2015-04-30 21:47:54,175 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local2069324582_0001_m_000000_0' done. 2015-04-30 21:47:54,175 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local2069324582_0001_m_000000_0 2015-04-30 21:47:54,176 [Thread-7] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. 2015-04-30 21:47:54,179 [Thread-7] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for reduce tasks 2015-04-30 21:47:54,180 [pool-3-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local2069324582_0001_r_000000_0 2015-04-30 21:47:54,247 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ] 2015-04-30 21:47:54,271 [pool-3-thread-1] INFO org.apache.hadoop.mapred.ReduceTask - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@1f3176b9 2015-04-30 21:47:54,337 [pool-3-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - MergerManager: memoryLimit=709551680, maxSingleShuffleLimit=177387920, mergeThreshold=468304128, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2015-04-30 21:47:54,351 [EventFetcher for fetching Map Completion Events] INFO org.apache.hadoop.mapreduce.task.reduce.EventFetcher - attempt_local2069324582_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 2015-04-30 21:47:54,424 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2015-04-30 21:47:54,424 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local2069324582_0001] 2015-04-30 21:47:54,512 [localfetcher#1] INFO org.apache.hadoop.mapreduce.task.reduce.LocalFetcher - localfetcher#1 about to shuffle output of map attempt_local2069324582_0001_m_000000_0 decomp: 30 len: 34 to MEMORY 2015-04-30 21:47:54,533 [localfetcher#1] INFO org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput - Read 30 bytes from map-output for attempt_local2069324582_0001_m_000000_0 2015-04-30 21:47:54,534 [localfetcher#1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - closeInMemoryFile -> map-output of size: 30, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->30 2015-04-30 21:47:54,538 [Readahead Thread #0] WARN org.apache.hadoop.io.ReadaheadPool - Failed readahead on ifile EBADF: Bad file descriptor at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:263) at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:142) at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-04-30 21:47:54,539 [EventFetcher for fetching Map Completion Events] INFO org.apache.hadoop.mapreduce.task.reduce.EventFetcher - EventFetcher is interrupted.. Returning 2015-04-30 21:47:54,558 [pool-3-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied. 2015-04-30 21:47:54,559 [pool-3-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs 2015-04-30 21:47:54,591 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments 2015-04-30 21:47:54,595 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 22 bytes 2015-04-30 21:47:54,598 [pool-3-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merged 1 segments, 30 bytes to disk to satisfy reduce memory limit 2015-04-30 21:47:54,598 [pool-3-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 1 files, 34 bytes from disk 2015-04-30 21:47:54,600 [pool-3-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 0 segments, 0 bytes from memory into reduce 2015-04-30 21:47:54,600 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments 2015-04-30 21:47:54,601 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 22 bytes 2015-04-30 21:47:54,602 [pool-3-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied. 2015-04-30 21:47:54,610 [pool-3-thread-1] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 2015-04-30 21:47:54,635 [pool-3-thread-1] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2015-04-30 21:47:54,725 [pool-3-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: MY_INPUT[1,11],T[2,20],T2[3,5],F[2,38],F2[5,5] C: R: F3[7,5] 2015-04-30 21:47:54,847 [pool-3-thread-1] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id ===== Task Information Header ===== Command: python -u sf.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming) Start time: Thu Apr 30 21:47:54 UTC 20152015-04-30 21:47:54,858 [pool-3-thread-1] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap ===== * * * ===== 2015-04-30 21:47:54,899 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Task - Task:attempt_local2069324582_0001_r_000000_0 is done. And is in the process of committing 2015-04-30 21:47:54,914 [pool-3-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied. 2015-04-30 21:47:54,914 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Task - Task attempt_local2069324582_0001_r_000000_0 is allowed to commit now 2015-04-30 21:47:54,917 [pool-3-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local2069324582_0001_r_000000_0' to file:/home/hadoop/steve/test_out/T2/_temporary/0/task_local2069324582_0001_r_000000 2015-04-30 21:47:54,920 [pool-3-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local2069324582_0001_r_000000_0' to file:/home/hadoop/steve/test_out/F2/_temporary/0/task_local2069324582_0001_r_000000 2015-04-30 21:47:54,923 [pool-3-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local2069324582_0001_r_000000_0' to file:/home/hadoop/steve/test_out/F3/_temporary/0/task_local2069324582_0001_r_000000 2015-04-30 21:47:54,926 [pool-3-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce 2015-04-30 21:47:54,926 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local2069324582_0001_r_000000_0' done. 2015-04-30 21:47:54,926 [pool-3-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local2069324582_0001_r_000000_0 2015-04-30 21:47:54,926 [Thread-7] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete. 2015-04-30 21:47:55,152 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2015-04-30 21:47:55,162 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2015-04-30 21:47:55,164 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 2015-04-30 21:47:55,165 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2015-04-30 21:47:55,263 [main] INFO org.apache.pig.tools.pigstats.JobStats - using output size reader: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader 2015-04-30 21:47:55,267 [main] INFO org.apache.pig.tools.pigstats.JobStats - using output size reader: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader 2015-04-30 21:47:55,269 [main] INFO org.apache.pig.tools.pigstats.JobStats - using output size reader: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader 2015-04-30 21:47:55,273 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2015-04-30 21:47:55,276 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.4.0-amzn-4 0.14.0 hadoop 2015-04-30 21:47:49 2015-04-30 21:47:55 GROUP_BY,STREAMING Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_local2069324582_0001 1 1 n/a n/a n/a n/a n/a n/a n/a n/a F,F2,F3,MY_INPUT,T,T2 MULTI_QUERY file:///home/hadoop/steve/test_out/T2,file:///home/hadoop/steve/test_out/F2,file:///home/hadoop/steve/test_out/F3, Input(s): Successfully read 2 records from: "file:///home/hadoop/steve/test_in" Output(s): Successfully stored 0 records in: "file:///home/hadoop/steve/test_out/T2" Successfully stored 2 records (20 bytes) in: "file:///home/hadoop/steve/test_out/F2" Successfully stored 1 records (2 bytes) in: "file:///home/hadoop/steve/test_out/F3" Counters: Total records written : 3 Total bytes written : 22 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_local2069324582_0001 2015-04-30 21:47:55,278 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2015-04-30 21:47:55,280 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2015-04-30 21:47:55,282 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2015-04-30 21:47:55,298 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2015-04-30 21:47:55,342 [main] INFO org.apache.pig.Main - Pig script completed in 12 seconds and 27 milliseconds (12027 ms) 2015-04-30 21:47:55,345 [Thread-12] INFO amazon.emr.metrics.MetricsSaver - MetricsSaver j-2027WAZUSMYG8:i-3891d1c5:RunJar:05647 metricFile /mnt/var/em/raw/i-3891d1c5_20150430_null_00000_raw.bin 2015-04-30 21:47:55,354 [Thread-12] INFO amazon.emr.metrics.MetricsSaver - Saved 2:4 records to /mnt/var/em/raw/i-3891d1c5_20150430_RunJar_05647_raw.bin