angelosnm opened a new issue, #1723: URL: https://github.com/apache/sedona/issues/1723
I have set up a standalone Spark cluster where PySpark jobs are being sent. These jobs are having the below config where S3/MinIO is being used as HDFS (using the S3A package) to read raster files: ```python config = ( SedonaContext.builder() .master(spark_endpoint) \ .appName("RasterProcessingWithSedona") \ .config("spark.driver.host", socket.gethostbyname(socket.gethostname())) \ .config("spark.driver.port", "2222") \ .config("spark.blockManager.port", "36859") \ .config("spark.executor.memory", "16g") \ .config("spark.executor.cores", "4") \ .config("spark.driver.memory", "10g") \ .config("spark.hadoop.fs.s3a.endpoint", s3_endpoint) \ .config("spark.hadoop.fs.s3a.access.key", s3_access_key_id) \ .config("spark.hadoop.fs.s3a.secret.key", s3_secret_access_key) \ .config("spark.hadoop.fs.s3a.connection.ssl.enabled", "false") \ .config("spark.hadoop.fs.s3a.path.style.access", "true") \ .config( 'spark.jars.packages', 'org.apache.sedona:sedona-spark-shaded-3.5_2.12:1.6.1,' 'org.datasyslab:geotools-wrapper:1.6.1-28.2' ) .getOrCreate() ) ``` Then, the raster/tif files are being accessed as per below: ```python raster_path = "s3a://data/BFA" rawDf = sedona.read.format("binaryFile").option("recursiveFileLookup", "true").option("pathGlobFilter", "*.tif*").load(raster_path) rawDf.createOrReplaceTempView("rawdf") rawDf.show() ``` And this code. returns the error mentioned in the "Actual behavior" entry. If this code runs under local mode it runs normally ## Expected behavior  ## Actual behavior ```bash Py4JJavaError: An error occurred while calling o66.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 5) (192.168.18.112 executor 1): TaskResultLost (result lost from block manager) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:989) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2393) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2414) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2433) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:530) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:483) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:61) at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4334) at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:3316) at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4324) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4322) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4322) at org.apache.spark.sql.Dataset.head(Dataset.scala:3316) at org.apache.spark.sql.Dataset.take(Dataset.scala:3539) at org.apache.spark.sql.Dataset.getRows(Dataset.scala:280) at org.apache.spark.sql.Dataset.showString(Dataset.scala:315) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) ``` ## Steps to reproduce the problem ## Settings Sedona version = 1.6.1 Apache Spark version = 3.5.2 Apache Flink version = N/A API type = Python Scala version = 2.12 JRE version = 1.8.0_432 Python version = 3.11.10 Environment = Standalone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org