[ 
https://issues.apache.org/jira/browse/SPARK-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277955#comment-15277955
 ] 

Kevin McHale commented on SPARK-14162:
--------------------------------------

[~sunrui] you are incorrect.

You should take a look at https://issues.apache.org/jira/browse/SPARK-14204 and 
the github issue, because:

1. The temporary workaround that I list there could not solve the problem as 
you describe it.

2. There is a blatant error in the code.

> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14162
>                 URL: https://issues.apache.org/jira/browse/SPARK-14162
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.1
>            Reporter: Zoltan Fedor
>
> This is an interesting one.
> We are using JupyterHub with Python to connect to a Hadoop cluster to run 
> Spark jobs and as the new Spark versions come out I compile them and add as 
> new kernels to JupyterHub to be used.
> There are also some libraries we are using, like ojdbc to connect to an 
> Oracle database.
> Now the interesting thing, that ojdbc worked fine in Spark 1.6.0 but suddenly 
> "it cannot be found" in 1.6.1.
> Everything, all settings are the same when starting pyspark 1.6.1 and 1.6.0, 
> so there is no reason for it not to work in 1.6.1 if it works in 1.6.0.
> This is the pysparjk code I am running in both 1.6.1 and 1.6.0:
> {quote}
> df = 
> sqlContext.read.format('jdbc').options(url='jdbc:oracle:thin:'+connection_script+'',
>  dbtable='bi.contact').load()
> print(df.count()){quote}
> And it throws this error in 1.6.1 only:
> {quote}
> java.lang.IllegalStateException: Did not find registered driver with class 
> oracle.jdbc.OracleDriver
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
>       at scala.Option.getOrElse(Option.scala:120)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:347)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745){quote}
> I know that this usually means that the ojdbc driver is not available on the 
> executor, but it is. Spark is being started the exact same way in 1.6.1 as in 
> 1.6.0 and it does find it on 1.6.0.
> I can steadily reproduce this, so the only conclusion that something must 
> have changed between 1.6.0 and 1.6.1 causing this, but I have see no 
> "depreciation" notice of anything what could cause this.
> Environment variables set when starting pyspark 1.6.1:
> {quote}
>   "SPARK_HOME": "/usr/lib/spark-1.6.1-hive",
>   "SCALA_HOME": "/usr/lib/scala",
>   "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf",
>   "HADOOP_HOME": "/usr/bin/hadoop",
>   "HIVE_HOME": "/usr/bin/hive",
>   "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH",
>   "YARN_HOME": "",
>   "SPARK_DIST_CLASSPATH": 
> "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*",
>   "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib",
>   "PATH": 
> "/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/apps/home/zoltan.fedor/.local/bin:/apps/home/zoltan.fedor/bin:/usr/bin/hadoop/bin",
>   "PYTHONPATH": 
> "/usr/lib/spark-1.6.1-hive/python/:/usr/lib/spark-1.6.1-hive/python/lib/py4j-0.9-src.zip",
>   "PYTHONSTARTUP": "/usr/lib/spark-1.6.1-hive/python/pyspark/shell.py",
>   "PYSPARK_SUBMIT_ARGS": "--master yarn --deploy-mode client --name 
> JupyterHub --executor-memory 2G --driver-memory 2G --queue root.frbdusers 
> --num-executors 10 --executor-cores 2 --conf 
> spark.executor.extraClassPath=/usr/lib/hadoop/lib,/apps/bin/oracle_ojdbc/ojdbc6.jar
>  --driver-class-path /apps/bin/oracle_ojdbc/ojdbc6.jar --files 
> /usr/lib/spark-1.6.1-hive/conf/hive-site.xml --jars 
> /usr/lib/avro/avro-mapred.jar,/usr/lib/spark-1.6.1-hive/lib/spark-examples-1.6.1-hadoop2.5.0-cdh5.3.3.jar,/apps/bin/oracle_ojdbc/ojdbc6.jar,/usr/lib/spark-1.6.1-hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/lib/spark-1.6.1-hive/lib/datanucleus-core-3.2.10.jar,/usr/lib/spark-1.6.1-hive/lib/datanucleus-rdbms-3.2.9.jar
>  pyspark-shell",
>   "PYSPARK_PYTHON": "/hadoop/cloudera/parcels/Anaconda/bin/python",
>   "PYTHON_DRIVER_PYTHON": "/apps/bin/anaconda/anaconda2/bin/ipython",
>   "HIVE_CP": "/hadoop/coudera/parcels/CDH/lib/hive/lib/",
>   "SPARK_YARN_USER_ENV": 
> "PYTHONPATH=/usr/lib/spark-1.6.1-hive/python/:/usr/lib/spark-1.6.1-hive/python/lib/py4j-0.9-src.zip"
> {quote}
> And in 1.6.0:
> {quote}
>   "SPARK_HOME": "/usr/lib/spark-1.6.0-hive",
>   "SCALA_HOME": "/usr/lib/scala",
>   "HADOOP_CONF_DIR": "/etc/hadoop/venus-hadoop-conf",
>   "HADOOP_HOME": "/usr/bin/hadoop",
>   "HIVE_HOME": "/usr/bin/hive",
>   "LD_LIBRARY_PATH": "/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH",
>   "YARN_HOME": "",
>   "SPARK_DIST_CLASSPATH": 
> "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*",
>   "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib",
>   "PATH": 
> "/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/apps/home/zoltan.fedor/.local/bin:/apps/home/zoltan.fedor/bin:/usr/bin/hadoop/bin",
>   "PYTHONPATH": 
> "/usr/lib/spark-1.6.0-hive/python/:/usr/lib/spark-1.6.0-hive/python/lib/py4j-0.9-src.zip",
>   "PYTHONSTARTUP": "/usr/lib/spark-1.6.0-hive/python/pyspark/shell.py",
>   "PYSPARK_SUBMIT_ARGS": "--master yarn --deploy-mode client --name 
> JupyterHub --executor-memory 2G --driver-memory 2G --queue root.frbdusers 
> --num-executors 10 --executor-cores 2 --conf 
> spark.executor.extraClassPath=/usr/lib/hadoop/lib,/apps/bin/oracle_ojdbc/ojdbc6.jar
>  --driver-class-path /apps/bin/oracle_ojdbc/ojdbc6.jar --files 
> /usr/lib/spark-1.6.0-hive/conf/hive-site.xml --jars 
> /usr/lib/avro/avro-mapred.jar,/usr/lib/spark-1.6.0-hive/lib/spark-examples-1.6.0-hadoop2.5.0-cdh5.3.3.jar,/apps/bin/oracle_ojdbc/ojdbc6.jar,/usr/lib/spark-1.6.0-hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/lib/spark-1.6.0-hive/lib/datanucleus-core-3.2.10.jar,/usr/lib/spark-1.6.0-hive/lib/datanucleus-rdbms-3.2.9.jar
>  pyspark-shell",
>   "PYSPARK_PYTHON": "/hadoop/cloudera/parcels/Anaconda/bin/python",
>   "PYTHON_DRIVER_PYTHON": "/apps/bin/anaconda/anaconda2/bin/ipython",
>   "HIVE_CP": "/hadoop/coudera/parcels/CDH/lib/hive/lib/",
>   "SPARK_YARN_USER_ENV": 
> "PYTHONPATH=/usr/lib/spark-1.6.0-hive/python/:/usr/lib/spark-1.6.0-hive/python/lib/py4j-0.8.2.1-src.zip"
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to