[ 
https://issues.apache.org/jira/browse/IMPALA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080046#comment-18080046
 ] 

ASF subversion and git services commented on IMPALA-14801:
----------------------------------------------------------

Commit fc78ecb81d496581380d10ae749909d53c073dca in impala's branch 
refs/heads/master from Mihaly Szjatinya
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fc78ecb81 ]

IMPALA-14801: Catalog topic update creation can't skip Iceberg tables

This patch extends the skipping mechanism from IMPALA-6671 for Iceberg
tables. It uses the Iceberg table's own tableLock, and the
lastVersionSeenByTopicUpdate mechanism of the underlying HDFS table.

Testing:
Added TestTopicUpdateFrequency::test_topic_updates_*_iceberg in 4
variants

Fixed the original HDFS tests:
1. test_topic_updates_unblock
  - By default it was running in Local Catalog mode, which has no effect
    for fast DML/DQL queries. Added variants for both Legacy and Local
    Catalog mode to demonstrate the case.
  - In the blocking scenario there was a missing assert (!), making the
    test always pass.
  - Fast queries are much faster than stated, which doesn't seem to
    matter however for the nature of the test.
  - Reduced query delay times

2. test_topic_updates_advance
  - The test claimed to test catalog_max_lock_skipped_topic_updates but
    experimentally I could see no counter blockings triggered at all
    under any configuration.
  - Added more parallel threads and reduced
    catalog_max_lock_skipped_topic_updates to 2 to reliably trigger the
    blockings. Ran 20 times locally to verify.
  - Query execution time expectations largely incorrect. Perhaps for
    some reason it changed over time.
  - Assert expects the max query time to be no more than a predictable
    value. Which conceptually makes sense for SYNC_DDL, but I wasn't
    able to reliably reproduce the case yet.
  - Hence for now at least checking the blockings occurring in catalog
    logs.
  - Reduced query delay times

3. Removed test_topic_lock_timeout_disabled, it is now covered by one of
   the test_topic_updates_unblock variants

Change-Id: I51e46820aaa096f3eb69f4dcf580e49a69d6603d
Reviewed-on: http://gerrit.cloudera.org:8080/24243
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Csaba Ringhofer <[email protected]>


> Catalog topic update creation can't skip Iceberg tables
> -------------------------------------------------------
>
>                 Key: IMPALA-14801
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14801
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Csaba Ringhofer
>            Assignee: Mihaly Szjatinya
>            Priority: Critical
>
> https://github.com/apache/impala/blob/31769a7fb50ae1d6b6d69d366a776df441e00e3a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L1819
> lockTableAndAddToCatalogDelta() calls lockHdfsTblWithTimeout() only for 
> HdfsTable, which means that for IcerbergTable  tbl.takeReadLock() lock is 
> called, which block the catalog topic update collection till it can take the 
> table lock. This can be a serious issue, as loading Iceberg tables can take a 
> significant amount of time.
> The skipping logic was implemented in IMPALA-6671 for Hive tables: 
> https://github.com/apache/impala/commit/2fccd82590d747d834b8be6f3b05bb446d9bac12
> Test for existing skipping logic: 
> https://github.com/apache/impala/blob/31769a7fb50ae1d6b6d69d366a776df441e00e3a/tests/custom_cluster/test_topic_update_frequency.py#L30
> It uses several debug actions to inject delays to reproduce the blocking 
> issue. Testing Iceberg tables should be possible in a similar way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to