jiayuasu commented on issue #1047:
URL: https://github.com/apache/sedona/issues/1047#issuecomment-1767833680

   @JimShady This is definitely possible. We need to do this in 3 steps in 
PySpark:
   
   Please first use `sedona.read.format("binaryFile")` to read these geotiffs. 
Then use `RS_FromGeoTiff()` to create the raster type column for this 
DataFrame. Pay attention to the `1` in `ID` column. You can do that by `SELECT 
1 as ID, path, raster FROM df`
   
   ```
   |ID|path|raster|
   |1|XX_1|GridCoverageXXX|
   |1|XX_2|GridCoverageXXX|
   |1|XX_3|GridCoverageXXX|
   |1|XX_4|GridCoverageXXX|
   |1|XX_5|GridCoverageXXX|
   |1|XX_6|GridCoverageXXX|
   ```
   
   Now the exciting part comes:
   
   1. Transpose your DataFrame (swap row and column) using pivot function 
(https://sparkbyexamples.com/spark/spark-transpose-rows-to-columns-in-dataframe/amp/).
   ```
   from pyspark.sql.functions import first
   
   df.groupBy("ID").pivot("path").agg(first("raster"))
   ```
   
   The resulting DF should be
   ```
   |ID|XX_1|XX_2|XX_3|XX_4|XX_5|XX_6|
   
|1|GridCoverageXXX|GridCoverageXXX|GridCoverageXXX|GridCoverageXXX|GridCoverageXXX|GridCoverageXXX|
   ```
   
   2. Now add up all bands together using 
RS_AddBand(https://sedona.apache.org/1.5.0/api/sql/Raster-operators/#rs_addband)
 assuming the band order is the same as the original file names:
   
   ```
   sedona.sql("SELECT 
RS_AddBand(RS_AddBand(RS_AddBand(RS_AddBand(RS_AddBand(XX_1, XX_2), XX_3), 
XX_4), XX_5), XX_6) as raster FROM pivot_df")
   ```
   The resulting df should be like this
   
   ```
   |raster|
   |GridCoverageXXX|
   ```
   
   Note that: now this raster is a single raster with 6 bands.
   
   3. Now let's use `RS_MapAlgebra` to do the trick. Make sure you understand 
the logical operators in Jiffle script 
(https://github.com/geosolutions-it/jai-ext/wiki/Jiffle---language-summary#logical-operators).
 Note that MapAlgebra uses 0-indexed band ID while other functions in Sedona 
rasters use 1-indexed band ID.
   
   ```
   sedona.sql("SELECT RS_MapAlgebra(raster, 'D', 'out = rast[5] > 10 && [rast4] 
< 4 ? (rast[0] + rast[1] + rast[2]) / rast[3] : 0')) FROM df")
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to