PrathameshDhapodkar opened a new issue, #975: URL: https://github.com/apache/sedona/issues/975
## Expected behavior Read the shape file and convert it to Dataframe join dataframe with other dataframe on ST_CONTAINS condition ## Actual behavior 23/08/16 02:24:37 INFO SparkContext: Created broadcast 2 from newAPIHadoopFile at ShapefileReader.java:170 23/08/16 02:24:37 INFO FileInputFormat: Total input files to process : 1 23/08/16 02:24:37 INFO SparkContext: Starting job: collect at ShapefileReader.java:188 23/08/16 02:24:37 INFO DAGScheduler: Job 1 finished: collect at ShapefileReader.java:188, took 0.018726 s Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 at scala.collection.mutable.WrappedArray$ofRef.apply(WrappedArray.scala:193) at scala.collection.convert.Wrappers$SeqWrapper.get(Wrappers.scala:74) at org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readFieldNames(ShapefileReader.java:188) at org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:82) at org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66) the error message on line 188 for readFileNames is different that what it is being produced. Can't figure out why the implementation does not work. ## Steps to reproduce the problem 1. create schema Schema = StructType(Array( StructField("shapeGeom", GeometryUDT, nullable = true), StructField("aoiId", DataTypes.StringType, nullable = true), StructField("market", DataTypes.StringType, nullable = true), StructField("marketId", DataTypes.StringType, nullable = true), StructField("Region", DataTypes.StringType, nullable = true), StructField("areaSqMi", DataTypes.DoubleType, nullable = true), StructField("pop10", DataTypes.IntegerType, nullable = true), StructField("aoigNodeB", DataTypes.StringType, nullable = true), StructField("initialgN", DataTypes.IntegerType, nullable = true), StructField("pop20", DataTypes.IntegerType, nullable = true), StructField("shapeLength", DataTypes.DoubleType, nullable = true), StructField("shapeArea", DataTypes.DoubleType, nullable = true) 2. Create spark context 3. Create a functon getAoiShapeDf and inside it 4. Read the .shp file from s3 or any location 5. Shapefilerdd = ShapefileReader.readToGeometryRDD(session.sparkContext, ShapefileLocation) Shapefilerdd.CRSTransform("epsg:4326", "epsg:5070", false) 6. df = Adapter.toDf(Shapefilerdd, Schema, session) 7. return df code: def getAoiShapeDf: DataFrame = { val ShapefileLocation = "<location>" val Shapefilerdd= ShapefileReader.readToGeometryRDD(session.sparkContext, ShapefileLocation) Shapefilerdd.CRSTransform("epsg:4326", "epsg:5070", false) val df = Adapter.toDf(Shapefilerdd, Schema, session) df } 8. create a join function which accepts another df2, lat_col from df2 and long_col from df2 as arg 9. Broadcast it to spark worker nodes 10. create geom point for df2.lat_col and df2.long_col using ST_TRANSFORM 11. join the df2 with df(from function getAoiShapeDf()) 12. return shapeJoin() code: def enrich(df2:dataframe,clientLatColumn: String, clientLongColumn: String): DataFrame = { val networkAoiShape = broadcast(this.getAoiShapeDf.select("shapeGeom","AOI_ID")) val dataWithGeom = df2.withColumn("aoiGeoPoint", expr(s"ST_TRANSFORM(ST_POINT(CAST($clientLatColumn AS DOUBLE), CAST($clientLongColumn AS DOUBLE)), 'EPSG:4326', 'EPSG:5070')")) val shapeJoin = dataWithGeom.alias("aoiData").join(networkAoiShape.alias("shapeData"), expr("ST_Contains(shapeData.shapeGeom,roamingAoiData.aoiGeoPoint)"),"LeftOuter") shapeJoin ## Settings EMR serverless settings: version - 6.9.0 Architecture - x86_64 respective dependency .jars Sedona version = [sedona-python-adapter-3.0_2.13](https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-3.0_2.13) ยป 1.3.1-incubating Apache Spark version = ? 3.3.2 API type = Scala, Java, Python? Scala Scala version = 2.11, 2.12, 2.13? 3.12 JRE version = 1.8, 1.11? JDK 11.0.17 Python version = ? 3 Environment = Standalone, AWS EC2, EMR, Azure, Databricks? EMR Serverless -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org