[
https://issues.apache.org/jira/browse/IMPALA-13543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riza Suminto resolved IMPALA-13543.
-----------------------------------
Fix Version/s: Impala 4.5.0
Resolution: Fixed
> Make tpcds_partitioned eligible for single_node_perf_run.py
> -----------------------------------------------------------
>
> Key: IMPALA-13543
> URL: https://issues.apache.org/jira/browse/IMPALA-13543
> Project: IMPALA
> Issue Type: Improvement
> Components: Infrastructure
> Reporter: Riza Suminto
> Assignee: Riza Suminto
> Priority: Major
> Fix For: Impala 4.5.0
>
>
> tpcds_partitioned dataset is a fully-partitioned version of tpcds dataset
> (the latter only partition store_sales table). It does not have the default
> text format database like tpcds dataset. Instead, it relies on pre-existence
> of text format tpcds database, which then INSERT OVERWRITE INTO
> tpcds_partitioned database equivalent. It does not have its own queries set,
> but instead symlinked to share testdata/workloads/tpcds/queries. It also have
> slightly different schema from tpcds dataset, namely column
> "c_last_review_date" in tpcds dataset is "c_last_review_date_sk" in
> tpcds_partitioned (TPC-DS v2.11.0, see related commit in
> [impala-tpcds-kit|https://github.com/cloudera/impala-tpcds-kit/commit/086d7113c8b4172247f83f60f4e274fe3326df11]).
> Those reasons make tpcds_partitioned ineligible for perf-AB-test
> (single_node_perf_run.py), which require dataset loadable though
> bin/load-data.py in single execution. single_node_perf_run.py and related
> scripts must be modified a bit to accept tpcds_partitioned dataset for
> benchmark.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)