[
https://issues.apache.org/jira/browse/IMPALA-13851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Csaba Ringhofer updated IMPALA-13851:
-------------------------------------
Description:
It is a common (and currently the most efficient) way to store point data as
double x/double y pairs instead of a single geometry (BINARY) column.
For predicates where these points must intersect a complex geometry it can be a
useful optimization to prefilter x/y columns directly with the bounding rect of
the complex geometry in addittiton to running the st_ predicate. An example:
{code}
WHERE st_contains(<const_geometry>, st_point(x,y))
{code}
can be rewritten as:
{code}
WHERE st_contains(<const_geometry>, st_point(x,y))
AND x>=st_xmin(<const_geometry>) AND y>=st_ymin(<const_geometry>)
AND x<=st_xmax(<const_geometry>) AND y<=st_ymax(<const_geometry>)
{code}
This has two benefits:
1. the planner will move the >= <= predicates before st_contains which allows
avoiding the expensive per row st_contains() call for points that failed the
bounding box check
2. >=/<= predicates can be pushed down in more cases, for example for Parquet
min/max stat filtering or Iceberg min/max stat filtering - if the files/pages
have limited bounding boxes then this can save IO.
An expression rewrite rule can be added that does this automatically.
was:
It is a common (and currently the most efficient) way to store point data as
double x/double y pairs instead of a single geometry (BINARY) column.
For predicates where these points must intersect a complex geometry it can be a
useful optimization to prefilter x/y columns directly with the bounding rect of
the complex geometry in addittiton to running the st_ predicate. An example:
{code}
WHERE st_contains(<const_geometry>, st_point(x,y))
{code}
can be rewritten as:
{code}
WHERE st_contains(<const_geometry>, st_point(x,y))
AND x>=st_xmin(<const_geometry>) AND y>=st_ymin(<const_geometry>)
AND x<=st_xmax(<const_geometry>) AND y<=st_ymax(<const_geometry>)
{code}
This has two benefits:
1. the planner will move the >= <= predicates before st_contains which allows
avoiding the expensive per row st_contains() call for points that failed the
bounding box check
2. >=/<= predicates can be pushed down in more cases, for example for Parquet
min/max stat filtering or Iceberg min/max stat filtering - if the files/pages
have limited bounding boxes then this can save IO.
An expression rewrite rule can be added that does this automatically.
> Add geospatial expression rewrites for lat/lon coded points
> -----------------------------------------------------------
>
> Key: IMPALA-13851
> URL: https://issues.apache.org/jira/browse/IMPALA-13851
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Csaba Ringhofer
> Priority: Major
> Labels: geospatial
>
> It is a common (and currently the most efficient) way to store point data as
> double x/double y pairs instead of a single geometry (BINARY) column.
> For predicates where these points must intersect a complex geometry it can be
> a useful optimization to prefilter x/y columns directly with the bounding
> rect of the complex geometry in addittiton to running the st_ predicate. An
> example:
> {code}
> WHERE st_contains(<const_geometry>, st_point(x,y))
> {code}
> can be rewritten as:
> {code}
> WHERE st_contains(<const_geometry>, st_point(x,y))
> AND x>=st_xmin(<const_geometry>) AND y>=st_ymin(<const_geometry>)
> AND x<=st_xmax(<const_geometry>) AND y<=st_ymax(<const_geometry>)
> {code}
> This has two benefits:
> 1. the planner will move the >= <= predicates before st_contains which allows
> avoiding the expensive per row st_contains() call for points that failed the
> bounding box check
> 2. >=/<= predicates can be pushed down in more cases, for example for Parquet
> min/max stat filtering or Iceberg min/max stat filtering - if the files/pages
> have limited bounding boxes then this can save IO.
> An expression rewrite rule can be added that does this automatically.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]