Max Gekk created SPARK-57839:
--------------------------------

             Summary: Support CBO filter/selectivity estimation for 
nanosecond-precision timestamps
                 Key: SPARK-57839
                 URL: https://issues.apache.org/jira/browse/SPARK-57839
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond 
precision).

h2. Problem
{{FilterEstimation}} (execution/.../FilterEstimation.scala ~L294, ~L416) 
matches microsecond {{TimestampType}} only (not even {{TimestampNTZType}}), and 
{{EstimationUtils.toDouble}} handles {{TimestampType}} only, so nanosecond 
columns get no range-based selectivity and produce silent mis-estimates.

h2. Goal
Include nanosecond timestamp columns in range/selectivity estimation, 
converting {{TimestampNanosVal}} min/max to the numeric domain used by the 
estimator.

h2. Scope
Broaden the type matches to include {{AnyTimestampNanoType}} (ideally via 
{{DatetimeType}}); add nanosecond value<->double conversion in 
{{EstimationUtils}}.

h2. Acceptance criteria
* Filter/range predicate selectivity on nanosecond columns is estimated (not 
defaulted); plans reflect stats.

h2. Testing
{{FilterEstimationSuite}}.

h2. Dependencies
Do AFTER SPARK-57812 (catalog column-statistics serialization) - needs 
persisted nanosecond min/max to be meaningful.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to