[ 
https://issues.apache.org/jira/browse/IMPALA-13543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17907478#comment-17907478
 ] 

ASF subversion and git services commented on IMPALA-13543:
----------------------------------------------------------

Commit 2f5aef64a5a8cf5fff6248355a2cb27e551652d5 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2f5aef64a ]

IMPALA-13617: Rename c_last_review_date to c_last_review_date_sk

TPC-DS v2.11.0, section 2.4.7, rename column customer.c_last_review_date
to customer.c_last_review_date_sk to align with other surrogate key
columns. impala-tpcds-kit has been modified to reflect this column name
change in
https://github.com/cloudera/impala-tpcds-kit/commit/086d7113c8b4172247f83f60f4e274fe3326df11
However, the tpcds dataset schema in Impala test data remains unchanged.

This patch did such a rename to align closer to TPC-DS v2.11.0. This
patch contains no data type adjustment because such adjustment requires
larger changes.

customer_multiblock_page_index.parquet added by IMPALA-10310 is
regenerated to follow the new schema of table customer. The SQL used to
create the file is ordered more specifically over both
c_current_cdemo_sk and c_customer_sk columns. The associated test
assertion in parquet-page-index.test is also updated.

A workaround in test_file_parser.py added by IMPALA-13543 is now removed
after this change is applied.

Testing:
- Pass core tests.

Change-Id: Ie446b3c534cb8f6f54265cd9b2f705cad91dd4ac
Reviewed-on: http://gerrit.cloudera.org:8080/22223
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Make tpcds_partitioned eligible for single_node_perf_run.py
> -----------------------------------------------------------
>
>                 Key: IMPALA-13543
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13543
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>             Fix For: Impala 4.5.0
>
>
> tpcds_partitioned dataset is a fully-partitioned version of tpcds dataset 
> (the latter only partition store_sales table). It does not have the default 
> text format database like tpcds dataset. Instead, it relies on pre-existence 
> of text format tpcds database, which then INSERT OVERWRITE INTO 
> tpcds_partitioned database equivalent. It does not have its own queries set, 
> but instead symlinked to share testdata/workloads/tpcds/queries. It also have 
> slightly different schema from tpcds dataset, namely column 
> "c_last_review_date" in tpcds dataset is "c_last_review_date_sk" in 
> tpcds_partitioned (TPC-DS v2.11.0, see related commit in 
> [impala-tpcds-kit|https://github.com/cloudera/impala-tpcds-kit/commit/086d7113c8b4172247f83f60f4e274fe3326df11]).
> Those reasons make tpcds_partitioned ineligible for perf-AB-test 
> (single_node_perf_run.py), which require dataset loadable though 
> bin/load-data.py in single execution. single_node_perf_run.py and related 
> scripts must be modified a bit to accept tpcds_partitioned dataset for 
> benchmark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to