Dear Apache Sedona Development Team, I hope this email finds you well. I am writing to seek your expertise and guidance regarding a performance issue I am encountering while using Apache Sedona. I have thoroughly reviewed the documentation and tried various approaches, but I believe there might be room for improvement in optimizing the geospatial query performance.
Here is a brief overview of my setup: * Apache Sedona version: 3.0 * Python version: 3.8 * PySpark integration While working with Apache Sedona, I have noticed that the geospatial query function is not delivering the level of performance improvement I had anticipated. I understand that there can be various factors affecting performance, including data size, cluster configuration, and query complexity. However, I would greatly appreciate any general best practices or recommendations you can offer to enhance the speed and efficiency of geospatial queries in my environment. Specifically, I am interested in understanding: * Recommended configuration settings or optimizations for Apache Sedona with PySpark. * Strategies for indexing and partitioning geospatial data effectively. * How Apache Sedona handles geospatial indexing and how it affects query performance. * How Apache Sedona behaves in cases where the query is running over partitions, for instance when using functions like ST_Union_Aggr. If you could provide insights or direct me to relevant resources, it would be immensely helpful in addressing my performance challenges and making the most of Apache Sedona for my geospatial data processing needs. I understand that you have a busy schedule, and I truly appreciate any time and assistance you can provide. Please feel free to provide any documentation, examples, or best practices that you believe would be beneficial. Thank you in advance for your support and contributions to the Apache Sedona project. I look forward to hearing from you. Best regards, Petr Poskocil