PrathameshDhapodkar opened a new issue, #987:
URL: https://github.com/apache/sedona/issues/987

   ## Expected behavior
   value| current_timestamp | network_operator_name | dl_load_date | 
isDishNrCell | isSimDish | nrNci | AOI_ID | Cluster_ID
   <nestedjsonvalue> | 8/22/2023 5:44:11 PM | Digicel | 8/22/2023 | FALSE | 
TRUE | 3569856325 | ALB | ALB-01-Downtown
   
   ## Actual behavior
   value| current_timestamp | network_operator_name | dl_load_date | 
isDishNrCell | isSimDish | nrNci | AOI_ID | Cluster_ID
   <nestedjsonvalue> | 8/22/2023 5:44:11 PM | Digicel | 8/22/2023 | FALSE | 
TRUE | 3569856325 |  | 
   
   ## Steps to reproduce the problem
   actual dataframe is a streaming dataset running on spark cluster.
   
   1. Create spark session
   2. get shape file from location(s3 here)
   code:
     def getAoiShapeDf: DataFrame  = {
       val aoiShapefileLocation = 
"s3://ue-bronze-dish-wireless-source-data-np/opensource_loc/top_shp/aoi_oto/"
       val aoiShapeRdd = 
ShapefileReader.readToGeometryRDD(session.sparkContext, aoiShapefileLocation)
       aoiShapeRdd.CRSTransform("epsg:4326", "epsg:5070", false)
       val aoiShapeDf = Adapter.toDf(aoiShapeRdd, session)
       aoiShapeDf
     }
   
   3. join shape file dataframe with actual dataframe on ST_Contains join 
condition.
   code:
     def enrichWithAoi(dataframe:DataFrame,clientLatColumn: String, 
clientLongColumn: String): DataFrame = {
       val networkAoiShape = 
broadcast(this.getAoiShapeDf.select("geometry","AOI_ID"))
       val ueDataWithGeom = dataframe.withColumn("aoiGeoPoint",
         expr(s"ST_TRANSFORM(ST_POINT(CAST($clientLatColumn AS DOUBLE), 
CAST($clientLongColumn AS DOUBLE)), 'EPSG:4326', 'EPSG:5070')"))
       val aoiShapeJoin = 
ueDataWithGeom.alias("roamingAoiData").join(networkAoiShape.alias("shapeData"),
         
expr("ST_Contains(shapeData.geometry,roamingAoiData.aoiGeoPoint)"),"LeftOuter")
       aoiShapeJoin.drop("geometry","aoiGeoPoint")
     }
   
   I tried with schema for shape files as well. Still the same result.
   
   ## Settings
   - EMR Serverless 6.9.0
   - spark 3.3.2
   - scala 2.12
   - jdk 11
   
   
   Sedona version = ?
   implementation group: 'org.apache.sedona', name: 
'sedona-python-adapter-3.0_2.12', version: '1.3.1-incubating'
   implementation group: 'org.apache.sedona', name: 'sedona-viz-3.0_2.12', 
version: '1.4.1'
   implementation group: 'org.apache.sedona', name: 'sedona-common', version: 
'1.4.1'
   implementation group: 'org.apache.sedona', name: 'sedona-sql-3.0_2.12', 
version: '1.4.1'
   
   Apache Spark version = ?
   3.3.2
   
   API type = Scala, Java, Python?
   Scala
   
   Scala version = 2.11, 2.12, 2.13?
   2.12
   
   JRE version = 1.8, 1.11?
   jdk11
   
   Environment = Standalone, AWS EC2, EMR, Azure, Databricks?
   AWS EMR Serverless


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to