Kristin Cowalcijk created SEDONA-646:
----------------------------------------

             Summary: Shapefile data source for DataFrame API
                 Key: SEDONA-646
                 URL: https://issues.apache.org/jira/browse/SEDONA-646
             Project: Apache Sedona
          Issue Type: New Feature
            Reporter: Kristin Cowalcijk
             Fix For: 1.7.0


The current shapefile reader returns a SpatialRDD, if users want a DataFrame, 
they must use the Adapter.toDf to convert the SpatialRDD to a DataFrame. A 
better approach is to support loading shapefiles as DataFrames using the 
DataFrame API:

{code:python}
df = sedona.read.format("shapefile").load("/path/to/shapefile")
{code}

This is more intuitive than

{code:python}
rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext, 
"/path/to/shapefile")
df = Adapter.toDf(rdd, spark)
{code}

We'll also make several more improvements:

1. Making the non-spatial attributes having appropriate data types. 
{{Adapter.toDf}} converts all non-spatial fields to string fields, which loses 
the original data types of non-spatial attributes.
2. Better handling of input paths. We should support paths of directories and 
paths of .shp files.
3. Infer code page from .cpg file, so that users don't have to change the Java 
system property {{sedona.global.charset}} to combat with encoding problems.
4. Infer the SRID of geometries from .prj file.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to