petern48 commented on PR #2038:
URL: https://github.com/apache/sedona/pull/2038#issuecomment-3028966983

   @zhangfengcdt I think we're mostly on the same page actually. The spark 
index column (`__index_level_{}__`) I'm using does actually represent the index 
in geopandas. See the comment in the from the pyspark codebase 
[here](https://github.com/apache/spark/blob/master/python/pyspark/pandas/internal.py)
 below
   
   ```python
   # A function to turn given numbers to Spark columns that represent 
pandas-on-Spark index.
   SPARK_INDEX_NAME_FORMAT = "__index_level_{}__".format
   SPARK_DEFAULT_INDEX_NAME = SPARK_INDEX_NAME_FORMAT(0)
   ```
   
   > However, if no index is used in the GeoSeries creation, then we don't need 
to support alignment
   
   If no index is given, pandas on pyspark creates a default index which we can 
use for the `align=True`. This is what the current tests use since we don't yet 
have index support.
   
   Originally, I was proposing not to support `align=False`, where geopandas 
uses the "natural ordering" of the series instead of the given index. However, 
it looks like Pandas on PySpark does already have a [hidden natural ordering 
column](https://github.com/apache/spark/blob/a1e628574b7d9cdf89472fa550ecc41f8a871b98/python/pyspark/pandas/internal.py#L77-L79),
 so we can try using that.
   
   Regardless, if the current default `align=True` logic sounds good to you, 
I'd rather merge this in now and revisit additional functionality 
(`align=False`) later when we add indexes (creating a separate issue of 
course). Does that make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to