Kristin Cowalcijk created SEDONA-646: ----------------------------------------
Summary: Shapefile data source for DataFrame API Key: SEDONA-646 URL: https://issues.apache.org/jira/browse/SEDONA-646 Project: Apache Sedona Issue Type: New Feature Reporter: Kristin Cowalcijk Fix For: 1.7.0 The current shapefile reader returns a SpatialRDD, if users want a DataFrame, they must use the Adapter.toDf to convert the SpatialRDD to a DataFrame. A better approach is to support loading shapefiles as DataFrames using the DataFrame API: {code:python} df = sedona.read.format("shapefile").load("/path/to/shapefile") {code} This is more intuitive than {code:python} rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext, "/path/to/shapefile") df = Adapter.toDf(rdd, spark) {code} We'll also make several more improvements: 1. Making the non-spatial attributes having appropriate data types. {{Adapter.toDf}} converts all non-spatial fields to string fields, which loses the original data types of non-spatial attributes. 2. Better handling of input paths. We should support paths of directories and paths of .shp files. 3. Infer code page from .cpg file, so that users don't have to change the Java system property {{sedona.global.charset}} to combat with encoding problems. 4. Infer the SRID of geometries from .prj file. -- This message was sent by Atlassian Jira (v8.20.10#820010)