[
https://issues.apache.org/jira/browse/SPARK-29088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
JP Bordenave updated SPARK-29088:
---------------------------------
Description:
hello,
i install hadoop 2.7.7 work fine
i install hive 2.3.6, work fine with hadoop, the lz-1.3.0.jar was replace by
lz-java-1.4.0 jar de spark because conflict class loader
i install spark 2.4.4
i configure hive-site.xml of hive/conf with spark engine and i copy then to
spark/conf
<property>
<name>hive.execution.engine</name>
<value>spark</value>
<description>Use Map Reduce as default execution engine</description>
</property>
<property>
<name>spark.master</name>
<value>spark://192.168.0.30:7077</value>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>/tmp</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.yarn.jars</name>
<value>hdfs://192.168.0.30:54310/spark-jars/*</value>
</property>
<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/hive/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
</configuration>
~
when i start hive with spark engine (hive work fine in context hadoop)
i can use show table
i can use query select * from employee ;
work fine
but when i use insert
but i try to insert i got fail,
Job failed with java.lang.NoSuchMethodError:
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED:
i have lz4-java-1.4.0.jar in spark/jars and i replace the lz-1.3.0.jar in
hive/lib
no more lz-1.3.0.jar, but it can't find the new method of lZ4-java
i remove all jar 1.2.1 and i replace them by all jar 2.3.6 from hive into
spark/jars
i add all jars spark-2.4.4/jars/* to hadoop 2.7.7 hdsf /spark-jars/
the worker driver log use the jar hive-exec-2.3.6.jar
i forget something todo ? it dont see the lz4-java-1.4.0 jar because the method
call exist in lz4-java-1.4.0, i have no more lz-1.3.0.jar, i have no conflict
in configuration hadoop+hive mode, with using dependency lz4-java-1.4.0
Thanks for your remarks, because i have no more idea where found solution. tha
fail in the map worker of spark engine, i must add somewhere je the jars
lz4-java ?.
{noformat}
SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Class path contains
multiple SLF4J bindings.SLF4J: Found binding in
[jar:file:/usr/lib/hive/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
Found binding in
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in
file:/usr/lib/hive/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async:
truehive> select * from employee > ;OK1 Allen IT2 Mag Sales3 Rob Sales4 Dana
IT6 Jean-Pierre Bordenave7 Pierre xXx11 Pierre xXxTime taken: 2.99 seconds,
Fetched: 7 row(s)hive> insert into employee values("10","Pierre","xXx");Query
ID = spark_20190915110359_e62a4e1a-fd69-4f17-a0f1-20513f291ddcTotal jobs =
1Launching Job 1 out of 1In order to change the average load for a reducer (in
bytes): set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the
maximum number of reducers: set hive.exec.reducers.max=<number>In order to set
a constant number of reducers: set mapreduce.job.reduces=<number>Starting
Spark Job = 6b9db937-53d2-4d45-84b2-8e5c6427d9d3
Query Hive on Spark job[0] stages: [0]
Status: Running (Hive on Spark
job[0])--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING
FAILED
--------------------------------------------------------------------------------------Stage-0
0 RUNNING 1 0 0 1 1
--------------------------------------------------------------------------------------STAGES:
00/01 [>>--------------------------] 0% ELAPSED TIME: 3,02 s
--------------------------------------------------------------------------------------Job
failed with java.lang.NoSuchMethodError:
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED:
Execution Error, return code 3 from
org.apache.hadoop.hive.ql.exec.spark.SparkTask.
java.util.concurrent.ExecutionException: Exception thrown by job at
org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337) at
org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342) at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)Caused by:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID
3, 192.168.0.30, executor 2): java.lang.NoSuchMethodError:
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at
org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at scala.Option.map(Option.scala:146) at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at
org.apache.spark.scheduler.Task.run(Task.scala:123) at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace: at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876) at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257) at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)Caused by:
java.lang.NoSuchMethodError:
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at
org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at scala.Option.map(Option.scala:146) at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at
org.apache.spark.scheduler.Task.run(Task.scala:123) at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}
was:
hello,
i install hadoop 2.7.7 work fine
i install hive 2.3.6, work fine with hadoop, the lz-1.3.0.jar was replace by
lz-java-1.4.0 jar de spark because conflict class loader
i install spark 2.4.4
i configure hive-site.xml of hive/conf with spark engine and i copy then to
spark/conf
<property>
<name>hive.execution.engine</name>
<value>spark</value>
<description>Use Map Reduce as default execution engine</description>
</property>
<property>
<name>spark.master</name>
<value>spark://192.168.0.30:7077</value>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>/tmp</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.yarn.jars</name>
<value>hdfs://192.168.0.30:54310/spark-jars/*</value>
</property>
<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/hive/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
</configuration>
~
when i start hive with spark engine (hive work fine in context hadoop)
i can use show table
i can use select * from employee table;
work fine
but when i use insert
but i try to insert i got fail,
Job failed with java.lang.NoSuchMethodError:
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED:
i have lz4-java-1.4.0.jar in spark/jars and i replace the lz-1.3.0.jar in
hive/lib
no more lz-1.3.0.jar, but it can't find the new method of lZ4-java
i remove all jar 1.2.1 and i replace them by all jar 2.3.6 from hive into
spark/jars
i add all jars spark-2.4.4/jars/* to hadoop 2.7.7 hdsf /spark-jars/
the worker driver log use the jar hive-exec-2.3.6.jar
i forget something todo ? it dont see the lz4-java-1.4.0 jar because the method
call exist in lz4-java-1.4.0, i have no more lz-1.3.0.jar, i have no conflict
in configuration hadoop+hive mode, with using dependency lz4-java-1.4.0
Thanks for your remarks, because i have no more idea where found solution. tha
fail in the map worker of spark engine, i must add somewhere je the jars
lz4-java ?.
{noformat}
SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Class path contains
multiple SLF4J bindings.SLF4J: Found binding in
[jar:file:/usr/lib/hive/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
Found binding in
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in
file:/usr/lib/hive/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async:
truehive> select * from employee > ;OK1 Allen IT2 Mag Sales3 Rob Sales4 Dana
IT6 Jean-Pierre Bordenave7 Pierre xXx11 Pierre xXxTime taken: 2.99 seconds,
Fetched: 7 row(s)hive> insert into employee values("10","Pierre","xXx");Query
ID = spark_20190915110359_e62a4e1a-fd69-4f17-a0f1-20513f291ddcTotal jobs =
1Launching Job 1 out of 1In order to change the average load for a reducer (in
bytes): set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the
maximum number of reducers: set hive.exec.reducers.max=<number>In order to set
a constant number of reducers: set mapreduce.job.reduces=<number>Starting
Spark Job = 6b9db937-53d2-4d45-84b2-8e5c6427d9d3
Query Hive on Spark job[0] stages: [0]
Status: Running (Hive on Spark
job[0])--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING
FAILED
--------------------------------------------------------------------------------------Stage-0
0 RUNNING 1 0 0 1 1
--------------------------------------------------------------------------------------STAGES:
00/01 [>>--------------------------] 0% ELAPSED TIME: 3,02 s
--------------------------------------------------------------------------------------Job
failed with java.lang.NoSuchMethodError:
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED:
Execution Error, return code 3 from
org.apache.hadoop.hive.ql.exec.spark.SparkTask.
java.util.concurrent.ExecutionException: Exception thrown by job at
org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337) at
org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342) at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)Caused by:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID
3, 192.168.0.30, executor 2): java.lang.NoSuchMethodError:
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at
org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at scala.Option.map(Option.scala:146) at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at
org.apache.spark.scheduler.Task.run(Task.scala:123) at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace: at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876) at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257) at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)Caused by:
java.lang.NoSuchMethodError:
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at
org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at scala.Option.map(Option.scala:146) at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at
org.apache.spark.scheduler.Task.run(Task.scala:123) at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}
> Hive 2.3.6 lz-1.3.0.jar and spark 2.4.4 lz4-java.jar, insert fail wait
> spark engine mode , work fine with hadoop mode
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-29088
> URL: https://issues.apache.org/jira/browse/SPARK-29088
> Project: Spark
> Issue Type: Bug
> Components: Deploy
> Affects Versions: 2.4.4
> Environment: linux ubuntu 18.04 standalone
> Reporter: JP Bordenave
> Priority: Critical
>
> hello,
> i install hadoop 2.7.7 work fine
> i install hive 2.3.6, work fine with hadoop, the lz-1.3.0.jar was replace
> by lz-java-1.4.0 jar de spark because conflict class loader
> i install spark 2.4.4
> i configure hive-site.xml of hive/conf with spark engine and i copy then to
> spark/conf
> <property>
> <name>hive.execution.engine</name>
> <value>spark</value>
> <description>Use Map Reduce as default execution engine</description>
> </property>
> <property>
> <name>spark.master</name>
> <value>spark://192.168.0.30:7077</value>
> </property>
> <property>
> <name>spark.eventLog.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>spark.eventLog.dir</name>
> <value>/tmp</value>
> </property>
> <property>
> <name>spark.serializer</name>
> <value>org.apache.spark.serializer.KryoSerializer</value>
> </property>
> <property>
> <name>spark.yarn.jars</name>
> <value>hdfs://192.168.0.30:54310/spark-jars/*</value>
> </property>
> <property>
> <name>system:java.io.tmpdir</name>
> <value>/tmp/hive/java</value>
> </property>
> <property>
> <name>system:user.name</name>
> <value>${user.name}</value>
> </property>
> </configuration>
> ~
> when i start hive with spark engine (hive work fine in context hadoop)
> i can use show table
> i can use query select * from employee ;
> work fine
> but when i use insert
> but i try to insert i got fail,
> Job failed with java.lang.NoSuchMethodError:
> net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED:
> i have lz4-java-1.4.0.jar in spark/jars and i replace the lz-1.3.0.jar in
> hive/lib
> no more lz-1.3.0.jar, but it can't find the new method of lZ4-java
> i remove all jar 1.2.1 and i replace them by all jar 2.3.6 from hive into
> spark/jars
> i add all jars spark-2.4.4/jars/* to hadoop 2.7.7 hdsf /spark-jars/
> the worker driver log use the jar hive-exec-2.3.6.jar
> i forget something todo ? it dont see the lz4-java-1.4.0 jar because the
> method call exist in lz4-java-1.4.0, i have no more lz-1.3.0.jar, i have no
> conflict in configuration hadoop+hive mode, with using dependency
> lz4-java-1.4.0
> Thanks for your remarks, because i have no more idea where found solution.
> tha fail in the map worker of spark engine, i must add somewhere je the jars
> lz4-java ?.
>
> {noformat}
>
> SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Class path contains
> multiple SLF4J bindings.SLF4J: Found binding in
> [jar:file:/usr/lib/hive/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
> Found binding in
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
> See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> Logging initialized using configuration in
> file:/usr/lib/hive/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async:
> truehive> select * from employee > ;OK1 Allen IT2 Mag Sales3 Rob Sales4
> Dana IT6 Jean-Pierre Bordenave7 Pierre xXx11 Pierre xXxTime taken: 2.99
> seconds, Fetched: 7 row(s)hive> insert into employee
> values("10","Pierre","xXx");Query ID =
> spark_20190915110359_e62a4e1a-fd69-4f17-a0f1-20513f291ddcTotal jobs =
> 1Launching Job 1 out of 1In order to change the average load for a reducer
> (in bytes): set hive.exec.reducers.bytes.per.reducer=<number>In order to
> limit the maximum number of reducers: set hive.exec.reducers.max=<number>In
> order to set a constant number of reducers: set
> mapreduce.job.reduces=<number>Starting Spark Job =
> 6b9db937-53d2-4d45-84b2-8e5c6427d9d3
> Query Hive on Spark job[0] stages: [0]
> Status: Running (Hive on Spark
> job[0])--------------------------------------------------------------------------------------
> STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING
> FAILED
> --------------------------------------------------------------------------------------Stage-0
> 0 RUNNING 1 0 0 1 1
>
> --------------------------------------------------------------------------------------STAGES:
> 00/01 [>>--------------------------] 0% ELAPSED TIME: 3,02 s
> --------------------------------------------------------------------------------------Job
> failed with java.lang.NoSuchMethodError:
> net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED:
> Execution Error, return code 3 from
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> java.util.concurrent.ExecutionException: Exception thrown by job at
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337) at
> org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342) at
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
> at
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)Caused by:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
> (TID 3, 192.168.0.30, executor 2): java.lang.NoSuchMethodError:
> net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at
> org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
> at scala.Option.map(Option.scala:146) at
> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
> at scala.Option.getOrElse(Option.scala:121) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
> at
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
> at
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
> at
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at
> org.apache.spark.scheduler.Task.run(Task.scala:123) at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace: at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
> at scala.Option.foreach(Option.scala:257) at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)Caused by:
> java.lang.NoSuchMethodError:
> net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at
> org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
> at scala.Option.map(Option.scala:146) at
> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
> at scala.Option.getOrElse(Option.scala:121) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
> at
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
> at
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
> at
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at
> org.apache.spark.scheduler.Task.run(Task.scala:123) at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
>
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]