jwass opened a new issue, #1268: URL: https://github.com/apache/sedona/issues/1268
Is there a way to spatially partition a dataframe, presumably by first converting to rdd then back, and write it out using that partitioning scheme? This is my guess as to how to accomplish this but I'm not sure if I'm misunderstanding things... I'm also relatively new to working with Spark and Sedona. ## Expected behavior Loading a dataframe, converting to rdd, spatially partition it, convert back to dataframe, and save the result - I'd expect the final dataframe partitioning to be preserved from the rdd. ## Actual behavior Adapter.toDf() does not preserve partitioning - or I'm doing something else wrong. ## Steps to reproduce the problem ``` df = sedona.read.format("geoparquet").load(path) rdd = Adapter.toSpatialRdd(df, "geometry") rdd.analyze() rdd.spatialPartitioning(GridType.KDBTREE, num_partitions=6) df2 = Adapter.toDf(rdd, spark) df2.write.format("geoparquet").save(output_path) ``` But it looked like that doesn't work - number of partitions written in df2 was far greater than 6. ## Settings Sedona version = 1.5.1 Apache Spark version = ? API type = Python Python version = ? Environment = Databricks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org