[
https://issues.apache.org/jira/browse/TEZ-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201770#comment-15201770
]
Jonathan Eagles commented on TEZ-3165:
--------------------------------------
[~sseth], it was found that both pig's
[HBaseStorage|https://github.com/apache/pig/blob/branch-0.14/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java]
and elephant-bird's
[SequenceFileStorage|https://github.com/twitter/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/util/SequenceFileConfig.java]
use OptionBuilder which is not thread-safe ([Thread-safety
notice|https://commons.apache.org/proper/commons-cli/javadocs/api-release/org/apache/commons/cli/OptionBuilder.html])
{noformat}
myinput = load 'hbase://mydb:mytable' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:m','-loadKey true $OPTIONS')
...
store output into 'myoutput' using
com.twitter.elephantbird.pig.store.SequenceFileStorage('-c
com.twitter.elephantbird.pig.util.TextConverter','-c
com.twitter.elephantbird.pig.util.TextConverter');
{noformat}
In this case we need to be able to completely serialize the initializations of
pig's processor, inputs, and outputs to avoid this condition. The fixes to pig
and elephant bird are in process, but this will allow compatibility mode for
the mode widely used versions as well as user defined functions which
potentially have the same issue.
> Parallel initialization of inputs, outputs, and processor can cause
> NoSuchMethodException
> -----------------------------------------------------------------------------------------
>
> Key: TEZ-3165
> URL: https://issues.apache.org/jira/browse/TEZ-3165
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3165.1.patch
>
>
> 2016-03-13 23:55:17,162 [INFO] [main]
> |runtime.LogicalIOProcessorRuntimeTask|: Initializing
> LogicalIOProcessorRuntimeTask with TaskSpec: DAGName :
> PigLatin:Script.pig-0_scope-0, VertexName: scope-203, VertexParallelism:
> 2707, TaskAttemptID:attempt_1,
> processorName=org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor,
> inputSpecListSize=1, outputSpecListSize=1, inputSpecList=[{{
> sourceVertexName=scope-0, physicalEdgeCount=1,
> inputClassName=org.apache.tez.mapreduce.input.MRInput }}, ],
> outputSpecList=[{{ destinationVertexName=scope-28, physicalEdgeCount=0,
> outputClassName=org.apache.tez.mapreduce.output.MROutput }}, ]
> 2016-03-13 23:55:17,164 [INFO] [main] |resources.MemoryDistributor|:
> InitialMemoryDistributor (isEnabled=true) invoked with: numInputs=1,
> numOutputs=1, JVM.maxFree=1059061760,
> allocatorClassName=org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor
> 2016-03-13 23:55:17,175 [INFO] [TezChild] |task.TezTaskRunner|: Initializing
> task, taskAttemptId=attempt_1
> 2016-03-13 23:55:17,182 [INFO] [TaskHeartbeatThread] |task.TaskReporter|:
> Routing events from heartbeat response to task,
> currentTaskAttemptId=attempt_1, eventCount=1 fromEventId=0 nextFromEventId=0
> 2016-03-13 23:55:17,212 [INFO] [I/O Setup 1 Initialize: {scope-28}]
> |Configuration.deprecation|: mapreduce.inputformat.class is deprecated.
> Instead, use mapreduce.job.inputformat.class
> 2016-03-13 23:55:17,214 [INFO] [I/O Setup 1 Initialize: {scope-28}]
> |Configuration.deprecation|: fs.default.name is deprecated. Instead, use
> fs.defaultFS
> 2016-03-13 23:55:17,223 [INFO] [I/O Setup 1 Initialize: {scope-28}]
> |counters.Limits|: Counter limits initialized with parameters:
> GROUP_NAME_MAX=256, MAX_GROUPS=1000, COUNTER_NAME_MAX=128, MAX_COUNTERS=5000
> 2016-03-13 23:55:17,228 [INFO] [I/O Setup 0 Initialize: {scope-0}]
> |input.MRInput|: scope-0 using newmapreduce API=true, split via event=true,
> numPhysicalInputs=1
> 2016-03-13 23:55:17,233 [INFO] [I/O Setup 0 Initialize: {scope-0}]
> |input.MRInput|: Initialized MRInput: scope-0
> 2016-03-13 23:55:17,345 [INFO] [TezChild] |data.SchemaTupleBackend|: Key
> [pig.schematuple] was not set... will not generate code.
> 2016-03-13 23:55:17,400 [INFO] [TezChild]
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-03-13 23:55:17,400 [INFO] [TezChild]
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild]
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-03-13 23:55:17,400 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an
> error while executing task: attempt_1
> java.lang.RuntimeException: could not instantiate
> 'com.twitter.elephantbird.pig.store.SequenceFileStorage' with arguments '[-c
> com.twitter.elephantbird.pig.util.TextConverter, -c
> com.twitter.elephantbird.pig.util.TextConverter]'
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:766)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:250)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:76)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigOutputFormatTez.getRecordWriter(PigOutputFormatTez.java:43)
> at
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:399)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:506)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:489)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:474)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:734)
> ... 14 more
> Caused by: java.lang.RuntimeException: Failed to create WritableConverter
> instance
> at
> com.twitter.elephantbird.pig.util.SequenceFileConfig.getWritableConverter(SequenceFileConfig.java:225)
> at
> com.twitter.elephantbird.pig.util.SequenceFileConfig.<init>(SequenceFileConfig.java:101)
> at
> com.twitter.elephantbird.pig.store.SequenceFileStorage$Config.<init>(SequenceFileStorage.java:89)
> at
> com.twitter.elephantbird.pig.store.SequenceFileStorage.<init>(SequenceFileStorage.java:190)
> ... 19 more
> Caused by: java.lang.NoSuchMethodException:
> com.twitter.elephantbird.pig.util.TextConverter.<init>(java.lang.String)
> at java.lang.Class.getConstructor0(Class.java:3074)
> at java.lang.Class.getConstructor(Class.java:1817)
> at
> com.twitter.elephantbird.pig.util.SequenceFileConfig.getWritableConverter(SequenceFileConfig.java:213)
> ... 22 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)