[
https://issues.apache.org/jira/browse/IMPALA-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894381#comment-17894381
]
ASF subversion and git services commented on IMPALA-13340:
----------------------------------------------------------
Commit c54d8ad4692768ff270947bc9a2f0f6fe629c701 in impala's branch
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c54d8ad46 ]
IMPALA-13340: Fix missing partitions in COPY TESTCASE of LocalCatalog mode
There are 3 places that we should fix:
*Exporting testcase files*
In LocalCatalog mode coordinators, to export the testcase file,
LocalFsTable objects are converted to THdfsTable objects. In this step,
coordinators should set the field of 'has_full_partitions' to true.
Otherwise, the partition map will be ignored when catalogd imports the
THdfsTable object.
*Importing testcase files*
When importing the THdfsTable object, catalogd should regenerate the
partition ids since those in the testcase file are usually generated by
the other catalogd instance (of another cluster). Reusing them might
conflict with the existing partition ids. Note that partition ids are
incremental ids generated by catalogd itself (starts from 0 at
bootstrap).
Table.loadFromThrift() is primarily used in coordinator side to load
metadata from catalogd. We always set 'storedInImpaladCatalogCache_' to
true in this method. However, this method is also used in catalogd to
import metadata from a testcase file. This patch adds a parameter to
this method to distinguish where it's used. So we can decide whether to
reuse the partition ids or generate new ones.
*Fetching metadata from catalogd*
When catalogd processes the getPartialCatalog requests on the imported
partitions, HdfsPartition#setPartitionMetadata() is used to update the
TPartialPartitionInfo instance. Previously this method used
'cachedMsPartitionDescriptor_ == null' to detect prototype partitions or
the only partition of unpartitioned tables. This is incorrect now since
HdfsPartitions imported from testcase files won't have
'cachedMsPartitionDescriptor_' set. The values of this field come from
msPartition objects from HMS and are not passed to the coordinators,
thus do not exist in the exported testcase files. This patch fixes the
condition to check prototype partition and unpartitioned tables
correctly.
Tests
- Added e2e tests to dump the partitioned table and verify the
partition and file metadata after importing it back. The test also
verify that we can get the same query plan after importing the
testcase file.
- Moved the util method __get_partition_id_set() from
test_reuse_partitions.py to ImpalaTestSuite so we can reuse it in the
new test. Also renamed it to get_partition_id_set().
Change-Id: Icc2e8b71564ad37973ddfca92801afea8e26ff73
Reviewed-on: http://gerrit.cloudera.org:8080/21864
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> COPY TESTCASE in LocalCatalog mode doesn't dump the partition and file
> metadata
> -------------------------------------------------------------------------------
>
> Key: IMPALA-13340
> URL: https://issues.apache.org/jira/browse/IMPALA-13340
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> IMPALA-11901 fixes the failures of using COPY TESTCASE statements in
> LocalCatalog mode. However, only the table metadata is dumped, e.g. table
> schema, column stats. The partition and file metadata are missing.
> To reproduce the issue locally, start the Impala cluster in LocalCatalog mode.
> {code:bash}
> bin/start-impala-cluster.py --catalogd_args=--catalog_topic_mode=minimal
> --impalad_args=--use_local_catalog{code}
> Dump the metadata of a query on a partitioned table:
> {noformat}
> copy testcase to '/tmp' select * from functional_parquet.alltypes;
> +--------------------------------------------------------------------------------------+
> | Test case data output path
> |
> +--------------------------------------------------------------------------------------+
> |
> hdfs://localhost:20500/tmp/impala-testcase-data-c8316356-6448-4458-acad-c2f72f43c3e1
> |
> +--------------------------------------------------------------------------------------+
> {noformat}
> Check the metadata from the source cluster
> {noformat}
> show partitions functional_parquet.alltypes
> +-------+-------+-------+--------+----------+--------------+-------------------+---------+-------------------+---------------------------------------------------------------------------+-----------+
> | year | month | #Rows | #Files | Size | Bytes Cached | Cache
> Replication | Format | Incremental stats | Location
> | EC Policy |
> +-------+-------+-------+--------+----------+--------------+-------------------+---------+-------------------+---------------------------------------------------------------------------+-----------+
> | 2009 | 1 | -1 | 1 | 8.60KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=1 |
> NONE |
> | 2009 | 2 | -1 | 1 | 8.09KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=2 |
> NONE |
> | 2009 | 3 | -1 | 1 | 8.60KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=3 |
> NONE |
> | 2009 | 4 | -1 | 1 | 8.20KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=4 |
> NONE |
> | 2009 | 5 | -1 | 1 | 8.55KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=5 |
> NONE |
> | 2009 | 6 | -1 | 1 | 8.23KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=6 |
> NONE |
> | 2009 | 7 | -1 | 1 | 8.25KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=7 |
> NONE |
> | 2009 | 8 | -1 | 1 | 8.60KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=8 |
> NONE |
> | 2009 | 9 | -1 | 1 | 8.41KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=9 |
> NONE |
> | 2009 | 10 | -1 | 1 | 8.60KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=10 |
> NONE |
> | 2009 | 11 | -1 | 1 | 8.44KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=11 |
> NONE |
> | 2009 | 12 | -1 | 1 | 8.59KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=12 |
> NONE |
> | 2010 | 1 | -1 | 1 | 8.38KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=1 |
> NONE |
> | 2010 | 2 | -1 | 1 | 7.80KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=2 |
> NONE |
> | 2010 | 3 | -1 | 1 | 8.29KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=3 |
> NONE |
> | 2010 | 4 | -1 | 1 | 8.20KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=4 |
> NONE |
> | 2010 | 5 | -1 | 1 | 8.62KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=5 |
> NONE |
> | 2010 | 6 | -1 | 1 | 8.26KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=6 |
> NONE |
> | 2010 | 7 | -1 | 1 | 8.60KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=7 |
> NONE |
> | 2010 | 8 | -1 | 1 | 8.64KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=8 |
> NONE |
> | 2010 | 9 | -1 | 1 | 8.20KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=9 |
> NONE |
> | 2010 | 10 | -1 | 1 | 8.59KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=10 |
> NONE |
> | 2010 | 11 | -1 | 1 | 8.20KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=11 |
> NONE |
> | 2010 | 12 | -1 | 1 | 8.60KB | NOT CACHED | NOT CACHED
> | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=12 |
> NONE |
> | Total | | -1 | 24 | 201.52KB | 0B |
> | | |
> | |
> +-------+-------+-------+--------+----------+--------------+-------------------+---------+-------------------+---------------------------------------------------------------------------+-----------+{noformat}
> Import the metadata
> {noformat}
> copy testcase from
> 'hdfs://localhost:20500/tmp/impala-testcase-data-c8316356-6448-4458-acad-c2f72f43c3e1';
> +----------------------------------------------------------------------------------------------------------------+
> | summary
> |
> +----------------------------------------------------------------------------------------------------------------+
> | Testcase generated using Impala version 4.5.0-SNAPSHOT. 1 db(s), 1 table(s)
> and 0 view(s) imported for query: |
> |
> |
> | SELECT * FROM functional_parquet.alltypes
> |
> +----------------------------------------------------------------------------------------------------------------+{noformat}
> Check the partition metadata, they are missing:
> {noformat}
> set PLANNER_TESTCASE_MODE=true;
> show partitions functional_parquet.alltypes;
> Query: show partitions functional_parquet.alltypes
> +-------+-------+-------+--------+------+--------------+-------------------+--------+-------------------+----------+-----------+
> | year | month | #Rows | #Files | Size | Bytes Cached | Cache Replication |
> Format | Incremental stats | Location | EC Policy |
> +-------+-------+-------+--------+------+--------------+-------------------+--------+-------------------+----------+-----------+
> | Total | | -1 | 0 | 0B | 0B | |
> | | | |
> +-------+-------+-------+--------+------+--------------+-------------------+--------+-------------------+----------+-----------+{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]