I am using spark in Jupyter as below:

import findspark
findspark.init()

from pyspark import SQLContext, SparkContext
sqlCtx = SQLContext(sc) 
df = sqlCtx.read.parquet("oci://mybucket@mytenant/myfile.parquet")

The error is:

Py4JJavaError: An error occurred while calling o198.parquet.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme 
"oci"

I have put oci-hdfs-full-2.7.2.0.jar defining oci filesystem on all namenodes 
and datanodes on hadoop. 

export PYSPARK_SUBMIT_ARGS="--master yarn --deploy-mode client pyspark-shell 
--driver-cores 8 --driver-memory 20g --num-executors 2 --executor-cores 6  
--executor-memory 30g --jars /mnt/data/hdfs/oci-hdfs-full-2.7.2.0.jar --conf 
spark.executor.extraClassPath=/mnt/data/hdfs/oci-hdfs-full-2.7.2.0.jar 
--conf spark.driver.extraClassPath=/mnt/data/hdfs/oci-hdfs-full-2.7.2.0.jar"

Any idea why this still happens? Thanks for any clue.



-- 
You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jupyter/5bca57ce-46d4-4a52-84a4-57d9781ce468%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to