douglasdennis commented on PR #693:
URL: https://github.com/apache/incubator-sedona/pull/693#issuecomment-1255376265

   > Another interesting finding: since the type-safe dataframe APIs do not 
need to call "udf.register()" to register all functions, is it possible that, 
as a side effect, predicate pushdown is finally supported in Sedona?
   
   @jiayuasu It does not appear to be so. Here are some results I ran this 
morning using the example1.parquet in the library.
   
   Checking to make sure predicate pushdown happens with native types:
   ```scala
   val geoparquetdatalocation1: String = resourceFolder + 
"geoparquet/example1.parquet"
   val basicPredicateDf = 
sparkSession.read.format("geoparquet").load(geoparquetdatalocation1).where(col("name").equalTo("Fiji"))
   basicPredicateDf.explain()
   ```
   The plan shows a push down:
   ```
   == Physical Plan ==
   *(1) Filter (isnotnull(name#5302) AND (name#5302 = Fiji))
   +- FileScan geoparquet 
[pop_est#5300L,continent#5301,name#5302,iso_a3#5303,gdp_md_est#5304,geometry#5305]
 Batched: false, DataFilters: [isnotnull(name#5302), (name#5302 = Fiji)], 
Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:<removed>, 
PartitionFilters: [], PushedFilters: [IsNotNull(name), EqualTo(name,Fiji)], 
ReadSchema: 
struct<pop_est:bigint,continent:string,name:string,iso_a3:string,gdp_md_est:double,geometry:array...
   ```
   \
   \
   A simple geometry based predicate:
   ```scala
   val geoparquetdatalocation1: String = resourceFolder + 
"geoparquet/example1.parquet"
   val basicGeomPredicateDf = 
sparkSession.read.format("geoparquet").load(geoparquetdatalocation1).where(ST_GeometryType("geometry").equalTo(lit("ST_Polygon")))
   basicGeomPredicateDf.explain()
   ```
   This plan does not show a push down:
   ```
   == Physical Plan ==
   Filter (st_geometrytype(geometry#5318) = ST_Polygon)
   +- FileScan geoparquet 
[pop_est#5313L,continent#5314,name#5315,iso_a3#5316,gdp_md_est#5317,geometry#5318]
 Batched: false, DataFilters: [(st_geometrytype(geometry#5318) = ST_Polygon)], 
Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:<removed>, 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<pop_est:bigint,continent:string,name:string,iso_a3:string,gdp_md_est:double,geometry:array...
   ```
   \
   \
   And just to be thorough, a more complex geometry based predicate:
   ```scala
   val geoparquetdatalocation1: String = resourceFolder + 
"geoparquet/example1.parquet"
   val basicGeomPredicateDf = 
sparkSession.read.format("geoparquet").load(geoparquetdatalocation1).where(ST_Distance("geometry",
 ST_Point(2, 2)) <= (50.0))
   basicGeomPredicateDf.explain()
   ```
   As expected, no push down as well:
   ```
   == Physical Plan ==
   Filter ( **org.apache.spark.sql.sedona_sql.expressions.ST_Distance**   <= 
50.0)
   +- FileScan geoparquet 
[pop_est#5326L,continent#5327,name#5328,iso_a3#5329,gdp_md_est#5330,geometry#5331]
 Batched: false, DataFilters: [( 
**org.apache.spark.sql.sedona_sql.expressions.ST_Distance**   <= 50.0)], 
Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:<removed>, 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<pop_est:bigint,continent:string,name:string,iso_a3:string,gdp_md_est:double,geometry:array...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to