[jira] [Updated] (SPARK-29088) Hive 2.3.6 lz-1.3.0.jar and spark 2.4.4 lz4-java.jar, insert fail wait spark engine mode , work fine with hadoop mode

JP Bordenave (Jira) Sun, 15 Sep 2019 03:06:58 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-29088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


JP Bordenave updated SPARK-29088:
---------------------------------
    Description: 
hello,  

i install  hadoop 2.7.7 work fine 

i install hive 2.3.6, work fine with hadoop, the   lz-1.3.0.jar was replace by 
lz-java-1.4.0 jar de spark because conflict class loader

i install spark 2.4.4  

i configure hive-site.xml of hive/conf  with spark engine and i copy then to 
spark/conf

<property>
 <name>hive.execution.engine</name>
 <value>spark</value>
 <description>Use Map Reduce as default execution engine</description>
 </property>
 <property>
 <name>spark.master</name>
 <value>spark://192.168.0.30:7077</value>
 </property>
 <property>
 <name>spark.eventLog.enabled</name>
 <value>true</value>
 </property>
 <property>
 <name>spark.eventLog.dir</name>
 <value>/tmp</value>
 </property>
 <property>
 <name>spark.serializer</name>
 <value>org.apache.spark.serializer.KryoSerializer</value>
 </property>
 <property>
 <name>spark.yarn.jars</name>
 <value>hdfs://192.168.0.30:54310/spark-jars/*</value>
 </property>
 <property>
 <name>system:java.io.tmpdir</name>
 <value>/tmp/hive/java</value>
 </property>
 <property>
 <name>system:user.name</name>
 <value>${user.name}</value>
 </property>
 </configuration>
 ~

when i start hive with spark engine (hive work fine in context hadoop)

i can use show table

i can use query select * from employee ;

work fine

but when i use insert 

but i try to insert i got fail, 

Job failed with java.lang.NoSuchMethodError: 
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: 

i have lz4-java-1.4.0.jar in spark/jars and i replace the lz-1.3.0.jar in 
hive/lib

no more lz-1.3.0.jar, but it can't find the new method of lZ4-java

i remove all jar 1.2.1 and i replace them by all jar 2.3.6 from hive into 
spark/jars

i add all jars spark-2.4.4/jars/*  to hadoop 2.7.7 hdsf /spark-jars/

the worker driver log use  the jar hive-exec-2.3.6.jar

i forget something todo ? it dont see the lz4-java-1.4.0 jar because the method 
call exist in lz4-java-1.4.0, i have no more lz-1.3.0.jar, i have no conflict 
in configuration hadoop+hive mode, with using dependency lz4-java-1.4.0

Thanks for your remarks, because i have no more idea where found solution. tha 
fail in the map worker of spark engine, i must add somewhere je the jars 
lz4-java ?.

 
{noformat}
 
SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Class path contains 
multiple SLF4J bindings.SLF4J: Found binding in 
[jar:file:/usr/lib/hive/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
 Found binding in 
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
 See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.SLF4J: Actual binding is of type 
[org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in 
file:/usr/lib/hive/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: 
truehive> select * from employee    > ;OK1 Allen IT2 Mag Sales3 Rob Sales4 Dana 
IT6 Jean-Pierre Bordenave7 Pierre xXx11 Pierre xXxTime taken: 2.99 seconds, 
Fetched: 7 row(s)hive> insert into employee values("10","Pierre","xXx");Query 
ID = spark_20190915110359_e62a4e1a-fd69-4f17-a0f1-20513f291ddcTotal jobs = 
1Launching Job 1 out of 1In order to change the average load for a reducer (in 
bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the 
maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set 
a constant number of reducers:  set mapreduce.job.reduces=<number>Starting 
Spark Job = 6b9db937-53d2-4d45-84b2-8e5c6427d9d3
Query Hive on Spark job[0] stages: [0]
Status: Running (Hive on Spark 
job[0])--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  
--------------------------------------------------------------------------------------Stage-0
                  0       RUNNING      1          0        0        1       1  
--------------------------------------------------------------------------------------STAGES:
 00/01    [>>--------------------------] 0%    ELAPSED TIME: 3,02 s     
--------------------------------------------------------------------------------------Job
 failed with java.lang.NoSuchMethodError: 
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: 
Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
java.util.concurrent.ExecutionException: Exception thrown by job at 
org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337) at 
org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342) at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
 at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, 192.168.0.30, executor 2): java.lang.NoSuchMethodError: 
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at 
org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
 at scala.Option.map(Option.scala:146) at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
 at scala.Option.getOrElse(Option.scala:121) at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
 at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
 at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) 
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) 
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at 
org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace: at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876) at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
 at scala.Option.foreach(Option.scala:257) at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)Caused by: 
java.lang.NoSuchMethodError: 
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at 
org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
 at scala.Option.map(Option.scala:146) at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
 at scala.Option.getOrElse(Option.scala:121) at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
 at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
 at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) 
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) 
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at 
org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)

{noformat}
 

 

  was:
hello,  

i install  hadoop 2.7.7 work fine 

i install hive 2.3.6, work fine with hadoop, the   lz-1.3.0.jar was replace by 
lz-java-1.4.0 jar de spark because conflict class loader

i install spark 2.4.4  

i configure hive-site.xml of hive/conf  with spark engine and i copy then to 
spark/conf

<property>
 <name>hive.execution.engine</name>
 <value>spark</value>
 <description>Use Map Reduce as default execution engine</description>
</property>
<property>
 <name>spark.master</name>
 <value>spark://192.168.0.30:7077</value>
 </property>
<property>
 <name>spark.eventLog.enabled</name>
 <value>true</value>
 </property>
<property>
 <name>spark.eventLog.dir</name>
 <value>/tmp</value>
 </property>
<property>
 <name>spark.serializer</name>
 <value>org.apache.spark.serializer.KryoSerializer</value>
 </property>
<property>
 <name>spark.yarn.jars</name>
 <value>hdfs://192.168.0.30:54310/spark-jars/*</value>
</property>
 <property>
 <name>system:java.io.tmpdir</name>
 <value>/tmp/hive/java</value>
 </property>
 <property>
 <name>system:user.name</name>
 <value>${user.name}</value>
 </property>
</configuration>
~

when i start hive with spark engine (hive work fine in context hadoop)

i can use show table

i can use select * from employee table;

work fine

but when i use insert 

but i try to insert i got fail, 

Job failed with java.lang.NoSuchMethodError: 
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: 

i have lz4-java-1.4.0.jar in spark/jars and i replace the lz-1.3.0.jar in 
hive/lib

no more lz-1.3.0.jar, but it can't find the new method of lZ4-java

i remove all jar 1.2.1 and i replace them by all jar 2.3.6 from hive into 
spark/jars

i add all jars spark-2.4.4/jars/*  to hadoop 2.7.7 hdsf /spark-jars/

the worker driver log use  the jar hive-exec-2.3.6.jar

i forget something todo ? it dont see the lz4-java-1.4.0 jar because the method 
call exist in lz4-java-1.4.0, i have no more lz-1.3.0.jar, i have no conflict 
in configuration hadoop+hive mode, with using dependency lz4-java-1.4.0

Thanks for your remarks, because i have no more idea where found solution. tha 
fail in the map worker of spark engine, i must add somewhere je the jars 
lz4-java ?.

 
{noformat}
 
SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Class path contains 
multiple SLF4J bindings.SLF4J: Found binding in 
[jar:file:/usr/lib/hive/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
 Found binding in 
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
 See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.SLF4J: Actual binding is of type 
[org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in 
file:/usr/lib/hive/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: 
truehive> select * from employee    > ;OK1 Allen IT2 Mag Sales3 Rob Sales4 Dana 
IT6 Jean-Pierre Bordenave7 Pierre xXx11 Pierre xXxTime taken: 2.99 seconds, 
Fetched: 7 row(s)hive> insert into employee values("10","Pierre","xXx");Query 
ID = spark_20190915110359_e62a4e1a-fd69-4f17-a0f1-20513f291ddcTotal jobs = 
1Launching Job 1 out of 1In order to change the average load for a reducer (in 
bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the 
maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set 
a constant number of reducers:  set mapreduce.job.reduces=<number>Starting 
Spark Job = 6b9db937-53d2-4d45-84b2-8e5c6427d9d3
Query Hive on Spark job[0] stages: [0]
Status: Running (Hive on Spark 
job[0])--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  
--------------------------------------------------------------------------------------Stage-0
                  0       RUNNING      1          0        0        1       1  
--------------------------------------------------------------------------------------STAGES:
 00/01    [>>--------------------------] 0%    ELAPSED TIME: 3,02 s     
--------------------------------------------------------------------------------------Job
 failed with java.lang.NoSuchMethodError: 
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: 
Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
java.util.concurrent.ExecutionException: Exception thrown by job at 
org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337) at 
org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342) at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
 at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, 192.168.0.30, executor 2): java.lang.NoSuchMethodError: 
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at 
org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
 at scala.Option.map(Option.scala:146) at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
 at scala.Option.getOrElse(Option.scala:121) at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
 at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
 at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) 
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) 
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at 
org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace: at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876) at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
 at scala.Option.foreach(Option.scala:257) at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)Caused by: 
java.lang.NoSuchMethodError: 
net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at 
org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
 at scala.Option.map(Option.scala:146) at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
 at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
 at scala.Option.getOrElse(Option.scala:121) at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
 at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
 at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) 
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) 
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at 
org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)

{noformat}
 

 


> Hive 2.3.6  lz-1.3.0.jar  and spark  2.4.4  lz4-java.jar, insert fail wait 
> spark engine mode , work fine with hadoop mode
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-29088
>                 URL: https://issues.apache.org/jira/browse/SPARK-29088
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy
>    Affects Versions: 2.4.4
>         Environment: linux ubuntu 18.04 standalone
>            Reporter: JP Bordenave
>            Priority: Critical
>
> hello,  
> i install  hadoop 2.7.7 work fine 
> i install hive 2.3.6, work fine with hadoop, the   lz-1.3.0.jar was replace 
> by lz-java-1.4.0 jar de spark because conflict class loader
> i install spark 2.4.4  
> i configure hive-site.xml of hive/conf  with spark engine and i copy then to 
> spark/conf
> <property>
>  <name>hive.execution.engine</name>
>  <value>spark</value>
>  <description>Use Map Reduce as default execution engine</description>
>  </property>
>  <property>
>  <name>spark.master</name>
>  <value>spark://192.168.0.30:7077</value>
>  </property>
>  <property>
>  <name>spark.eventLog.enabled</name>
>  <value>true</value>
>  </property>
>  <property>
>  <name>spark.eventLog.dir</name>
>  <value>/tmp</value>
>  </property>
>  <property>
>  <name>spark.serializer</name>
>  <value>org.apache.spark.serializer.KryoSerializer</value>
>  </property>
>  <property>
>  <name>spark.yarn.jars</name>
>  <value>hdfs://192.168.0.30:54310/spark-jars/*</value>
>  </property>
>  <property>
>  <name>system:java.io.tmpdir</name>
>  <value>/tmp/hive/java</value>
>  </property>
>  <property>
>  <name>system:user.name</name>
>  <value>${user.name}</value>
>  </property>
>  </configuration>
>  ~
> when i start hive with spark engine (hive work fine in context hadoop)
> i can use show table
> i can use query select * from employee ;
> work fine
> but when i use insert 
> but i try to insert i got fail, 
> Job failed with java.lang.NoSuchMethodError: 
> net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: 
> i have lz4-java-1.4.0.jar in spark/jars and i replace the lz-1.3.0.jar in 
> hive/lib
> no more lz-1.3.0.jar, but it can't find the new method of lZ4-java
> i remove all jar 1.2.1 and i replace them by all jar 2.3.6 from hive into 
> spark/jars
> i add all jars spark-2.4.4/jars/*  to hadoop 2.7.7 hdsf /spark-jars/
> the worker driver log use  the jar hive-exec-2.3.6.jar
> i forget something todo ? it dont see the lz4-java-1.4.0 jar because the 
> method call exist in lz4-java-1.4.0, i have no more lz-1.3.0.jar, i have no 
> conflict in configuration hadoop+hive mode, with using dependency 
> lz4-java-1.4.0
> Thanks for your remarks, because i have no more idea where found solution. 
> tha fail in the map worker of spark engine, i must add somewhere je the jars 
> lz4-java ?.
>  
> {noformat}
>  
> SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Class path contains 
> multiple SLF4J bindings.SLF4J: Found binding in 
> [jar:file:/usr/lib/hive/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> Logging initialized using configuration in 
> file:/usr/lib/hive/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: 
> truehive> select * from employee    > ;OK1 Allen IT2 Mag Sales3 Rob Sales4 
> Dana IT6 Jean-Pierre Bordenave7 Pierre xXx11 Pierre xXxTime taken: 2.99 
> seconds, Fetched: 7 row(s)hive> insert into employee 
> values("10","Pierre","xXx");Query ID = 
> spark_20190915110359_e62a4e1a-fd69-4f17-a0f1-20513f291ddcTotal jobs = 
> 1Launching Job 1 out of 1In order to change the average load for a reducer 
> (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to 
> limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In 
> order to set a constant number of reducers:  set 
> mapreduce.job.reduces=<number>Starting Spark Job = 
> 6b9db937-53d2-4d45-84b2-8e5c6427d9d3
> Query Hive on Spark job[0] stages: [0]
> Status: Running (Hive on Spark 
> job[0])--------------------------------------------------------------------------------------
>           STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  
> --------------------------------------------------------------------------------------Stage-0
>                   0       RUNNING      1          0        0        1       1 
>  
> --------------------------------------------------------------------------------------STAGES:
>  00/01    [>>--------------------------] 0%    ELAPSED TIME: 3,02 s     
> --------------------------------------------------------------------------------------Job
>  failed with java.lang.NoSuchMethodError: 
> net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: 
> Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337) at 
> org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342) at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
>  at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3, 192.168.0.30, executor 2): java.lang.NoSuchMethodError: 
> net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at 
> org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
>  at scala.Option.map(Option.scala:146) at 
> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
>  at scala.Option.getOrElse(Option.scala:121) at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
>  at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) 
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
>  at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace: at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876) 
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
>  at scala.Option.foreach(Option.scala:257) at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)Caused by: 
> java.lang.NoSuchMethodError: 
> net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at 
> org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
>  at scala.Option.map(Option.scala:146) at 
> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
>  at scala.Option.getOrElse(Option.scala:121) at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
>  at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) 
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
>  at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-29088) Hive 2.3.6 lz-1.3.0.jar and spark 2.4.4 lz4-java.jar, insert fail wait spark engine mode , work fine with hadoop mode

Reply via email to