Max Gekk created SPARK-57839:
--------------------------------
Summary: Support CBO filter/selectivity estimation for
nanosecond-precision timestamps
Key: SPARK-57839
URL: https://issues.apache.org/jira/browse/SPARK-57839
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
precision).
h2. Problem
{{FilterEstimation}} (execution/.../FilterEstimation.scala ~L294, ~L416)
matches microsecond {{TimestampType}} only (not even {{TimestampNTZType}}), and
{{EstimationUtils.toDouble}} handles {{TimestampType}} only, so nanosecond
columns get no range-based selectivity and produce silent mis-estimates.
h2. Goal
Include nanosecond timestamp columns in range/selectivity estimation,
converting {{TimestampNanosVal}} min/max to the numeric domain used by the
estimator.
h2. Scope
Broaden the type matches to include {{AnyTimestampNanoType}} (ideally via
{{DatetimeType}}); add nanosecond value<->double conversion in
{{EstimationUtils}}.
h2. Acceptance criteria
* Filter/range predicate selectivity on nanosecond columns is estimated (not
defaulted); plans reflect stats.
h2. Testing
{{FilterEstimationSuite}}.
h2. Dependencies
Do AFTER SPARK-57812 (catalog column-statistics serialization) - needs
persisted nanosecond min/max to be meaningful.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]