[ 
https://issues.apache.org/jira/browse/IMPALA-12670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803554#comment-17803554
 ] 

ASF subversion and git services commented on IMPALA-12670:
----------------------------------------------------------

Commit 5f434e84678e4401c26172a7121c8e3c70ab664f in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5f434e846 ]

IMPALA-12670: getIfPresent should throw the cause of error

CatalogdMetaProvider maintains a map (a Guava cache) as its local
catalog cache. It has a piggyback mechanism to load metadata from
catalogd that when concurrent threads want to load the same content
(identified by the same key) from catalogd, only one of them actually
sends the request and load the result into the cache. Other threads wait
and get the result when the work is done.

The piggyback mechanism is implemented by putting a Future object as the
value when the key doesn't exist in the cache. The Future object handles
the loading. Other threads that want the same value just invoke
Future.get() to wait. See more in the comments in loadWithCaching().

If there are any errors thrown in the loading process, Future.get() will
encapsulate the error into an ExecutionException and throw it instead.
The cause could be an InconsistentMetadataFetchException which indicates
FE should retry the planning. It's handled in Frontend#getTExecRequest().

In loadWithCaching(), we try to throw the cause of the exception thrown
from Future.get(). So the InconsistentMetadataFetchException can be
handled as expected. However, in getIfPresent(), the error handling is
inconsistent that it try to throw the current exception. That causes
retriable failures can't be retried. Note that this is an existing bug
but got more easy to be hitted after IMPALA-11501 because getIfPresent()
is now used in LocalDb#getTableIfCached() which is used in many places.

This patch fixes getIfPresent() to have the same logic of using the
Future object (including error handling) as loadWithCaching(). Also
adds more loggings in both catalogd and impalad sides when the lookup
status is abnormal.

In order to test the loading error more easily, this patch adds a hidden
flag, inject_failure_ratio_in_catalog_fetch, to randomly inject
retriable errors.

Tests
 - Ran test_local_catalog_ddls_with_invalidate_metadata 700 times.
 - Add e2e test that will easily fail without this fix.

Change-Id: I74268ba2bb700988107780e13ffbdbb4c767d09d
Reviewed-on: http://gerrit.cloudera.org:8080/20853
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> CatalogdMetaProvider.getIfPresent() not throwing the underlying 
> InconsistentMetadataFetchException
> --------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12670
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12670
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.0.0, Impala 3.3.0, Impala 3.4.0, Impala 3.4.1, 
> Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, Impala 4.3.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Blocker
>         Attachments: catalogd.INFO.gz, failed-query-profile.txt, 
> impalad.INFO.gz
>
>
> TestConcurrentDdls.test_local_catalog_ddls_with_invalidate_metadata could 
> fail due to InconsistentMetadataFetchException:
> {code:python}
> tests/custom_cluster/test_concurrent_ddls.py:72: in 
> test_local_catalog_ddls_with_invalidate_metadata
>     self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
> tests/custom_cluster/test_concurrent_ddls.py:148: in 
> _run_ddls_with_invalidation
>     worker[i].get(timeout=100)
> toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
>  in get
>     raise self._value
> E   AssertionError: ImpalaBeeswaxException:
> E      INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
> E      MESSAGE: RuntimeException: java.util.concurrent.ExecutionException: 
> org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching 
> TABLE failed. Could not find TCatalogObject(type:TABLE, catalog_version:0, 
> table:TTable(db_name:functional, tbl_name:alltypestiny))
> E     CAUSED BY: ExecutionException: 
> org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching 
> TABLE failed. Could not find TCatalogObject(type:TABLE, catalog_version:0, 
> table:TTable(db_name:functional, tbl_name:alltypestiny))
> E     CAUSED BY: InconsistentMetadataFetchException: Fetching TABLE failed. 
> Could not find TCatalogObject(type:TABLE, catalog_version:0, 
> table:TTable(db_name:functional, tbl_name:alltypestiny))
> E     
> E   assert <bound method type.is_acceptable_error of <class 
> 'test_concurrent_ddls.TestConcurrentDdls'>>("ImpalaBeeswaxException:\n INNER 
> EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: 
> RuntimeException: ja...ould not find TCatalogObject(type:TABLE, 
> catalog_version:0, table:TTable(db_name:functional, 
> tbl_name:alltypestiny))\n", False)
> E    +  where <bound method type.is_acceptable_error of <class 
> 'test_concurrent_ddls.TestConcurrentDdls'>> = 
> TestConcurrentDdls.is_acceptable_error
> E    +  and   "ImpalaBeeswaxException:\n INNER EXCEPTION: <class 
> 'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: RuntimeException: ja...ould 
> not find TCatalogObject(type:TABLE, catalog_version:0, 
> table:TTable(db_name:functional, tbl_name:alltypestiny))\n" = 
> str(ImpalaBeeswaxException()){code}
> The exception in impalad.INFO:
> {noformat}
> I0103 11:19:32.221755  8786 jni-util.cc:302] 
> 3b476356afa44e74:6f637cfa00000000] java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: 
> org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching 
> TABLE failed. Could not find TCatalogObject(type:TABLE, catalog_version:0, 
> table:TTable(db_name:functional, tbl_name:alltypestiny))
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.getIfPresent(CatalogdMetaProvider.java:901)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.getTableIfPresent(CatalogdMetaProvider.java:750)
>         at 
> org.apache.impala.catalog.local.LocalDb.getTableIfCached(LocalDb.java:128)
>         at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:143)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:314)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:169)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:145)
>         at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2348)
>         at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2110)
>         at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1883)
>         at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:169)
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching 
> TABLE failed. Could not find TCatalogObject(type:TABLE, catalog_version:0, 
> table:TTable(db_name:functional, tbl_name:alltypestiny))
>         at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.getIfPresent(CatalogdMetaProvider.java:898)
>         ... 10 more
> Caused by: 
> org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching 
> TABLE failed. Could not find TCatalogObject(type:TABLE, catalog_version:0, 
> table:TTable(db_name:functional, tbl_name:alltypestiny))
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:465)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:199)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$4.call(CatalogdMetaProvider.java:776)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$4.call(CatalogdMetaProvider.java:768)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:562)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadTable(CatalogdMetaProvider.java:764)
>         at 
> org.apache.impala.catalog.local.LocalTable.loadTableMetadata(LocalTable.java:152)
>         at 
> org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:104)
>         at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:147)
>         ... 7 more{noformat}
> Other tests in TestConcurrentDdls that use local catalog mode could also hit 
> the same issue, e.g. test_mixed_catalog_ddls_with_invalidate_metadata.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to