Dear Apache Sedona Development Team,

I hope this email finds you well. I am writing to seek your expertise and 
guidance regarding a performance issue I am encountering while using Apache 
Sedona. I have thoroughly reviewed the documentation and tried various 
approaches, but I believe there might be room for improvement in optimizing the 
geospatial query performance.

Here is a brief overview of my setup:

  *   Apache Sedona version: 3.0
  *   Python version: 3.8
  *   PySpark integration

While working with Apache Sedona, I have noticed that the geospatial query 
function is not delivering the level of performance improvement I had 
anticipated. I understand that there can be various factors affecting 
performance, including data size, cluster configuration, and query complexity. 
However, I would greatly appreciate any general best practices or 
recommendations you can offer to enhance the speed and efficiency of geospatial 
queries in my environment.

Specifically, I am interested in understanding:

  *   Recommended configuration settings or optimizations for Apache Sedona 
with PySpark.
  *   Strategies for indexing and partitioning geospatial data effectively.
  *   How Apache Sedona handles geospatial indexing and how it affects query 
performance.
  *   How Apache Sedona behaves in cases where the query is running over 
partitions, for instance when using functions like ST_Union_Aggr.

If you could provide insights or direct me to relevant resources, it would be 
immensely helpful in addressing my performance challenges and making the most 
of Apache Sedona for my geospatial data processing needs.

I understand that you have a busy schedule, and I truly appreciate any time and 
assistance you can provide. Please feel free to provide any documentation, 
examples, or best practices that you believe would be beneficial.

Thank you in advance for your support and contributions to the Apache Sedona 
project. I look forward to hearing from you.

Best regards,
Petr Poskocil

Reply via email to