petern48 opened a new issue, #2138:
URL: https://github.com/apache/sedona/issues/2138

   A lot of text below, but I'll highlight the main difference first. Notice 
our version has extra nested `[ ]`.
   ```
   # Our dataframe_to_arrow returns the following column
   geometry: [[0101...F03F],[0101...0040]]
   
   # But geopandas returns this.
   geometry: [[0101...F03F,0101...0040]]
   ```
   
   This happens for the index column (`__index_level_0__`) too, which leads to 
it being misterpreted as a column instead of being read in as an index when 
calling `gpd.GeoDataFrame.from_arrow()`
   ```
   # Sedona returns
      __index_level_0__     geometry
   0                  1  POINT (1 1)
   1                  2  POINT (2 2) 
   
   # Geopandas returns this
         geometry
   1  POINT (1 1)
   2  POINT (2 2)
   ```
   
   Full script and output below.
   ```python
   import geopandas as gpd
   import sedona.geopandas as sgpd
   from sedona.spark.geoarrow.geoarrow import dataframe_to_arrow
   
   sgpd_df = sgpd.GeoDataFrame({"geometry": [Point(1, 1), Point(2, 2)]}, 
index=pd.Index([1, 2]))
   spark_df = sgpd_df._internal.spark_frame.drop("__natural_order__")  # don't 
worry about this drop
   sgpd_arrow = dataframe_to_arrow(spark_df)
   
   gpd_df = gpd.GeoDataFrame({"geometry": [Point(1, 1), Point(2, 2)]}, 
index=pd.Index([1, 2]))
   gpd_arrow = pa.table(gpd_df.to_arrow())
   assert type(sgpd_arrow) == type(gpd_arrow) == pa.Table
   print("SEDONA\n", sgpd_arrow, "\n")
   gpd_df_from_sgpd_arrow = gpd.GeoDataFrame.from_arrow(sgpd_arrow)
   print(gpd_df_from_sgpd_arrow, "\n")
   print("GEOPANDAS\n", gpd_arrow, "\n")
   gpd_df_from_gpd_arrow = gpd.GeoDataFrame.from_arrow(gpd_arrow)
   print(gpd_df_from_gpd_arrow)
   ```
   
   ```
   SEDONA
    pyarrow.Table
   __index_level_0__: int64
   geometry: extension<geoarrow.wkb<WkbType>>
   ----
   __index_level_0__: [[1],[2]]
   geometry: 
[[0101000000000000000000F03F000000000000F03F],[010100000000000000000000400000000000000040]]
 
   
      __index_level_0__     geometry
   0                  1  POINT (1 1)
   1                  2  POINT (2 2) 
   
   GEOPANDAS
    pyarrow.Table
   geometry: extension<geoarrow.wkb<WkbType>>
   __index_level_0__: int64
   ----
   geometry: 
[[0101000000000000000000F03F000000000000F03F,010100000000000000000000400000000000000040]]
   __index_level_0__: [[1,2]] 
   
         geometry
   1  POINT (1 1)
   2  POINT (2 2)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to