petern48 commented on PR #2038: URL: https://github.com/apache/sedona/pull/2038#issuecomment-3028966983
@zhangfengcdt I think we're mostly on the same page actually. The spark index column (`__index_level_{}__`) I'm using does actually represent the index in geopandas. See the comment in the from the pyspark codebase [here](https://github.com/apache/spark/blob/master/python/pyspark/pandas/internal.py) below ```python # A function to turn given numbers to Spark columns that represent pandas-on-Spark index. SPARK_INDEX_NAME_FORMAT = "__index_level_{}__".format SPARK_DEFAULT_INDEX_NAME = SPARK_INDEX_NAME_FORMAT(0) ``` > However, if no index is used in the GeoSeries creation, then we don't need to support alignment If no index is given, pandas on pyspark creates a default index which we can use for the `align=True`. This is what the current tests use since we don't yet have index support. Originally, I was proposing not to support `align=False`, where geopandas uses the "natural ordering" of the series instead of the given index. However, it looks like Pandas on PySpark does already have a [hidden natural ordering column](https://github.com/apache/spark/blob/a1e628574b7d9cdf89472fa550ecc41f8a871b98/python/pyspark/pandas/internal.py#L77-L79), so we can try using that. Regardless, if the current default `align=True` logic sounds good to you, I'd rather merge this in now and revisit additional functionality (`align=False`) later when we add indexes (creating a separate issue of course). Does that make sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@sedona.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org