[ 
https://issues.apache.org/jira/browse/IMPALA-14570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-14570.
--------------------------------------
    Resolution: Fixed

> Speed up simple geospatial queries
> ----------------------------------
>
>                 Key: IMPALA-14570
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14570
>             Project: IMPALA
>          Issue Type: Epic
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Critical
>
> IMPALA-11745 added many geospatial functions by importing them from Hive.
> The current solution is very slow even for simple queries for several reasons:
> 1. the functions are in Java and each call has native->JVM overhead
> 2. most functions need to serialize/deserialize the BINARY format on each call
> 3. the internal BINARY format "ESRI shape" is not what usually files contain, 
> so per row conversion is needed before any filters
> 4. predicates like st_contains are always computed per row, no bounding rect 
> check is done at file level, while Parquet/GeoParquet/Iceberg offers ways to 
> do this
> There are no geospatial benchmarks for Impala yet, but my assumption that 
> many queries have selective scans and the most perf critical part is the 
> initial filtering of rows, so the focus is on queries like:
> SELECT * FROM t WHERE st_intersects(geom_col,  <const_geom>)
> Geospatial joins are also very inefficient at the moment (nested loop join) 
> but out of scope for this epic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to