[ 
https://issues.apache.org/jira/browse/SPARK-57562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57562:
-----------------------------
    Shepherd: Max Gekk

> Add benchmarks for the TIME data type
> -------------------------------------
>
>                 Key: SPARK-57562
>                 URL: https://issues.apache.org/jira/browse/SPARK-57562
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> h2. What
> Add performance benchmarks for the TIME data type to match the existing 
> DATE/TIMESTAMP
> benchmark coverage, and regenerate the result files.
> h2. Why
> The SPIP (SPARK-51162, Q6) flagged performance regressions as the main risk 
> and committed to
> developing TIME benchmarks in parallel. Today there is no TIME-specific 
> benchmark; the only
> TIME line in any benchmark is a single vector-updater case in 
> ParquetVectorUpdaterBenchmark.
> All other datetime benchmarks exclude TIME.
> h2. Scope
> * DateTimeBenchmark: add TIME cases (current_time, make_time, to_time, 
> hour/minute/second,
>   time_trunc, time_diff, TIME +/- interval, TIME - TIME, collect).
> * ExtractBenchmark: cover EXTRACT / date_part HOUR/MINUTE/SECOND on TIME.
> * MakeDateTimeBenchmark: add make_time.
> * Datasource read benchmarks: add TIME columns to JsonBenchmark, 
> CSVBenchmark, AvroReadBenchmark
>   (and Parquet where applicable).
> * Optionally HashBenchmark with TIME.
> * Regenerate the corresponding *-results.txt files and review the diffs.
> h2. Acceptance criteria
> * Each touched benchmark runs with TIME cases and the committed result files 
> are regenerated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to