[
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580730#comment-14580730
]
Dev Lakhani commented on SPARK-8142:
------------------------------------
Hi [~vanzin]
bq. if you want to use the glassfish jersey version, you shouldn't need to do
this, right? Spark depends on the old one that is under com.sun.*, IIRC.
Yes I need to make use of glassfish 2.x in my application and not the sun.* one
provided, but this could apply to any other dependency that needs to supersede
Sparks provided etc.
bq. marking all dependencies (including hbase) as provided and using {{spark.
{driver,executor}.extraClassPath}} might be the easiest way out if you really
need to use userClassPathFirst.
This is an option but might be a challenge to scale if we have different
folder layouts for the extraClassPath in different clusters/nodes for hbase and
hadoop installs. This can be (and usually is) the case when new servers are
added to existing ones for example. If we had /disk4/path/to/hbase/libs and
the other has /disk3/another/path/to/hbase/libs and so on then the
extraClassPath will need to include these both and grow significantly and spark
submit args along with it. Also when we update Hbase these then have to change
this classpath each time.
Maybe the ideal way is to have, as you suggest, a blacklist which would contain
spark and hadoop libs. Then we could put whatever we wanted into one uber/fat
jar and it doesn't matter where Hbase and Hadoop are installed or what's
provided and compiled, but we let spark work it out.
These are just my thoughts, I'm sure others will have different preferences
and/or better approaches. Thanks anyway for your input on this JIRA.
> Spark Job Fails with ResultTask ClassCastException
> --------------------------------------------------
>
> Key: SPARK-8142
> URL: https://issues.apache.org/jira/browse/SPARK-8142
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.3.1
> Reporter: Dev Lakhani
>
> When running a Spark Job, I get no failures in the application code
> whatsoever but a weird ResultTask Class exception. In my job, I create a RDD
> from HBase and for each partition do a REST call on an API, using a REST
> client. This has worked in IntelliJ but when I deploy to a cluster using
> spark-submit.sh I get :
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
> (TID 3, host): java.lang.ClassCastException:
> org.apache.spark.scheduler.ResultTask cannot be cast to
> org.apache.spark.scheduler.Task
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> These are the configs I set to override the spark classpath because I want to
> use my own glassfish jersey version:
>
> sparkConf.set("spark.driver.userClassPathFirst","true");
> sparkConf.set("spark.executor.userClassPathFirst","true");
> I see no other warnings or errors in any of the logs.
> Unfortunately I cannot post my code, but please ask me questions that will
> help debug the issue. Using spark 1.3.1 hadoop 2.6.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]