pkubaczynski opened a new issue, #2250:
URL: https://github.com/apache/sedona/issues/2250
Since the runtime 14.3 LTS release 14.3.33 in Databricks, writing to
geoparquet has stopped working. I tested it with Sedona 1.7.2.
```
import pyspark.sql.functions as f
from sedona.register.geo_registrator import SedonaRegistrator
from sedona.sql.st_constructors import ST_GeomFromWKT
SedonaRegistrator.registerAll(spark)
df_test = spark.createDataFrame([("POINT (11.7999569 45.0773231)")],
"string").toDF("geometry")
df_test = df_test.withColumn("geometry", ST_GeomFromWKT(f.col("geometry")))
df_test.write.mode("overwrite").format("geoparquet").save("/FileStore/geoparquet_test/")
```
The operation worked on runtime release 14.3.32, but after the release of
14.3.33, the code ends with this error.
```
Py4JJavaError: An error occurred while calling o458.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 65.0 failed 4 times, most recent failure: Lost task 0.3 in stage 65.0
(TID 84) (10.200.28.173 executor 0): java.lang.NoSuchMethodError:
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(Ljava/lang/String;Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)V
at
org.apache.spark.sql.execution.datasources.parquet.GeoParquetFileFormat$$anon$1.newInstance(GeoParquetFileFormat.scala:162)
at
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:186)
at
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:168)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:544)
at
org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:117)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:933)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:933)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at
org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at
org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at
org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:186)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:151)
at
com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
at
com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:145)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:960)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:107)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:963)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:855)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3874)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3796)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3783)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3783)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1661)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1646)
at scala.Option.foreach(Option.scala:407)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1646)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:4120)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4032)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4020)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:54)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1323)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1311)
at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:3082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:3065)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$6(FileFormatWriter.scala:434)
at
org.apache.spark.sql.catalyst.MetricKeyUtils$.measureMs(MetricKey.scala:1044)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$5(FileFormatWriter.scala:432)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:395)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:430)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$1(FileFormatWriter.scala:300)
at
com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:195)
at
org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$sideEffectResult$5(commands.scala:137)
at
org.apache.spark.sql.execution.SparkPlan.runCommandWithAetherOff(SparkPlan.scala:178)
at
org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:189)
at
org.apache.spark.sql.execution.command.DataWritingCommandExec.$anonfun$sideEffectResult$4(commands.scala:137)
at
com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:133)
at
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:132)
at
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:149)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$4(QueryExecution.scala:391)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:168)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:391)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$9(SQLExecution.scala:406)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:725)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:278)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1175)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:167)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:662)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:390)
at
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1203)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:386)
at
org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:326)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:383)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:367)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:505)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:83)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:505)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:39)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:343)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:339)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:481)
at
org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:367)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:400)
at
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:367)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:285)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:282)
at
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:462)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:1052)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:444)
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:406)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:264)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NoSuchMethodError:
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(Ljava/lang/String;Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)V
at
org.apache.spark.sql.execution.datasources.parquet.GeoParquetFileFormat$$anon$1.newInstance(GeoParquetFileFormat.scala:162)
at
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:186)
at
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:168)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:544)
at
org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:117)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:933)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:933)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at
org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at
org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at
org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:186)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:151)
at
com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
at
com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:145)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:960)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:107)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:963)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:855)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
File <command-4596863659123812>, line 8
6 df_test = spark.createDataFrame([("POINT (11.7999569 45.0773231)")],
"string").toDF("geometry")
7 df_test = df_test.withColumn("geometry",
ST_GeomFromWKT(f.col("geometry")))
----> 8
df_test.write.mode("overwrite").format("geoparquet").save("/FileStore/geoparquet_test/")
```
Below you can see the cluster configuration (photon is disabled).
<img width="963" height="766" alt="Image"
src="https://github.com/user-attachments/assets/ef375204-d8a4-4a97-a4f1-2ac93fc07d65"
/>
<img width="961" height="759" alt="Image"
src="https://github.com/user-attachments/assets/141c9cec-830e-4b6c-91c5-a63cd9431848"
/>
<img width="961" height="624" alt="Image"
src="https://github.com/user-attachments/assets/ef2734b2-7184-48ad-972e-2b508151620d"
/>
Please note that I can read already saved geoparquet files without any
problems, I just cannot save them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]