FengJiang2018 opened a new issue, #1059:
URL: https://github.com/apache/sedona/issues/1059

   ## Expected behavior
   geoparquet should have geo metadata be generated and should not raise error 
during read by using
   ``` python
    df = sedona.read.format("geoparquet").load(path)
   ``` 
   ## Actual behavior
   geoparquet was created without geo metadata and got error during read by 
using
   ``` python
    df = sedona.read.format("geoparquet").load(path)
   ``` 
   
   ## Steps to reproduce the problem
   Seems like the issue is when I was using df.write to a geoparquet file, the 
geo metadata was not created for the Sedona geometry column.  I am not sure if 
anything I missed.
   
   #1, I am using overture public dataset as input for the dataframe as 
following with Sedona Geometry column
   ``` python
   df_building = sedona.read.option("inferschema",True).parquet(inputpath) \
           .withColumn("geometry2",expr("ST_GeomFromWKB(geometry)"))
   df_building.createOrReplaceTempView("rawdf")
   ```
   
   #2, Yes I am using DataFrame to write a geoparquet file with Sedona Geometry 
Type column  on databricks.
   ``` python
   newdf = spark.sql("select *, ST_GeoHash(geometry2, 5) as geohash  from rawdf 
order by geohash").drop("geometry").withColumnRenamed("geometry2", "geometry")
   newdf.write.mode("overwrite").format("geoparquet") \
           .save(path+"/final1.parquet")
   ```
   
   Here is what I saw from the printSchema, it shows as geometry type, but the 
nullable is true seems like this is expected. Correct me if this is wrong.
   ``` cmd
   root
    |-- geometry: geometry (nullable = true)
    |-- geohash: string (nullable = true)
   ```
   
   #3, I got that error when I am using following way to read the geoparquet 
from #2
   ``` python
   df = sedona.read.format("geoparquet").load(newpath)
   ```
   
   But there is read error if I use following code, but **no geo metadata** 
cound be found from df schema
   ``` python
   df = sedona.read.format("geoparquet").parquet(newpath)
   ```
   
   
   ## Settings
   
   Sedona version = 1.5.0
   
   Apache Spark version = 3.4.0
   
   Apache Flink version = N/A
   
   API type = Python
   
   Scala version = 2.12
   
   JRE version = 1.8
   
   Python version = 3.10
   
   Environment = Azure Databricks, notebook


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to