[GitHub] [sedona] PrathameshDhapodkar opened a new issue, #975: Error in inbuild function readFileNames and readToGeometryRDD methods

via GitHub Tue, 15 Aug 2023 21:55:32 -0700


PrathameshDhapodkar opened a new issue, #975:
URL: https://github.com/apache/sedona/issues/975


   ## Expected behavior
   Read the shape file and convert it to Dataframe
   join dataframe with other dataframe on ST_CONTAINS condition
   
   ## Actual behavior
   23/08/16 02:24:37 INFO SparkContext: Created broadcast 2 from 
newAPIHadoopFile at ShapefileReader.java:170
   23/08/16 02:24:37 INFO FileInputFormat: Total input files to process : 1
   23/08/16 02:24:37 INFO SparkContext: Starting job: collect at 
ShapefileReader.java:188
   23/08/16 02:24:37 INFO DAGScheduler: Job 1 finished: collect at 
ShapefileReader.java:188, took 0.018726 s
   Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at 
scala.collection.mutable.WrappedArray$ofRef.apply(WrappedArray.scala:193)
        at scala.collection.convert.Wrappers$SeqWrapper.get(Wrappers.scala:74)
        at 
org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readFieldNames(ShapefileReader.java:188)
        at 
org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:82)
        at 
org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66)
   
   
   the error message on line 188 for readFileNames is different that what it is 
being produced. Can't figure out why the implementation does not work.
   
   
   ## Steps to reproduce the problem
   1. create schema
   Schema = StructType(Array(
       StructField("shapeGeom", GeometryUDT, nullable = true),
       StructField("aoiId", DataTypes.StringType, nullable = true),
       StructField("market", DataTypes.StringType, nullable = true),
       StructField("marketId", DataTypes.StringType, nullable = true),
       StructField("Region", DataTypes.StringType, nullable = true),
       StructField("areaSqMi", DataTypes.DoubleType, nullable = true),
       StructField("pop10", DataTypes.IntegerType, nullable = true),
       StructField("aoigNodeB", DataTypes.StringType, nullable = true),
       StructField("initialgN", DataTypes.IntegerType, nullable = true),
       StructField("pop20", DataTypes.IntegerType, nullable = true),
       StructField("shapeLength", DataTypes.DoubleType, nullable = true),
       StructField("shapeArea", DataTypes.DoubleType, nullable = true) 
   2. Create spark context
   3. Create a functon getAoiShapeDf and inside it
   4. Read the .shp file from s3 or any location
   5. Shapefilerdd = ShapefileReader.readToGeometryRDD(session.sparkContext, 
ShapefileLocation)
   Shapefilerdd.CRSTransform("epsg:4326", "epsg:5070", false)
   6. df = Adapter.toDf(Shapefilerdd, Schema, session) 
   7. return df
   code:  
   def getAoiShapeDf: DataFrame  = {
       val ShapefileLocation = "<location>"
       val Shapefilerdd= 
ShapefileReader.readToGeometryRDD(session.sparkContext, ShapefileLocation)
       Shapefilerdd.CRSTransform("epsg:4326", "epsg:5070", false)
       val df = Adapter.toDf(Shapefilerdd, Schema, session)
       df
     }
   
   8. create a join function which accepts another df2, lat_col from df2 and 
long_col from df2 as arg
   9. Broadcast it to spark worker nodes
   10. create geom point for df2.lat_col and df2.long_col using ST_TRANSFORM
   11. join the df2 with df(from function getAoiShapeDf())
   12. return shapeJoin()
   code:
   def enrich(df2:dataframe,clientLatColumn: String, clientLongColumn: String): 
DataFrame = {
       val networkAoiShape = 
broadcast(this.getAoiShapeDf.select("shapeGeom","AOI_ID"))
       val dataWithGeom = df2.withColumn("aoiGeoPoint",
         expr(s"ST_TRANSFORM(ST_POINT(CAST($clientLatColumn AS DOUBLE), 
CAST($clientLongColumn AS DOUBLE)), 'EPSG:4326', 'EPSG:5070')"))
       val shapeJoin = 
dataWithGeom.alias("aoiData").join(networkAoiShape.alias("shapeData"),
         
expr("ST_Contains(shapeData.shapeGeom,roamingAoiData.aoiGeoPoint)"),"LeftOuter")
      shapeJoin
   
   ## Settings
   EMR serverless settings:
   version - 6.9.0
   Architecture - x86_64
   respective dependency .jars
   
   Sedona version = 
   
[sedona-python-adapter-3.0_2.13](https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-3.0_2.13)
 » 1.3.1-incubating
   
   Apache Spark version = ?
   3.3.2
   
   API type = Scala, Java, Python?
   Scala
   
   Scala version = 2.11, 2.12, 2.13?
   3.12
   
   JRE version = 1.8, 1.11?
   JDK 11.0.17
   
   Python version = ?
   3
   
   Environment = Standalone, AWS EC2, EMR, Azure, Databricks?
   EMR Serverless


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [sedona] PrathameshDhapodkar opened a new issue, #975: Error in inbuild function readFileNames and readToGeometryRDD methods

Reply via email to