[jira] [Commented] (IMPALA-11768) EventProcessor should ignore failures on non-existing dbs/tables

Quanlong Huang (Jira) Wed, 30 Nov 2022 20:32:09 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641658#comment-17641658
 ]


Quanlong Huang commented on IMPALA-11768:
-----------------------------------------

I realized that we do have codes handling the table not found issue in 
HdfsTable#reloadPartitionsFromNames():
[https://github.com/apache/impala/blob/b17446818c548487fa5d71c678088ed35eed5bc8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2760-L2769]
{code:java}
  public int reloadPartitionsFromNames(IMetaStoreClient client,
      List<String> partNames, String reason, FileMetadataLoadOpts 
fileMetadataLoadOpts)
      throws CatalogException {
      ...
    } catch (NoSuchObjectException e) {
      // HMS throws a NoSuchObjectException if the table does not exist
      // in HMS anymore. In case the partitions don't exist in HMS it does not 
include
      // them in the result of getPartitionsByNames.
      throw new TableLoadingException(
          "Error when reloading partitions for table " + getFullName(), e);
    } catch (TException e2) {
      throw new CatalogException(
          "Unexpected error while retrieving partitions for table " + 
getFullName(), e2);
    }
  }
{code}
However, HMS actually throws an InvalidObjectException internally:
{noformat}
2022-11-29T14:10:07,373 ERROR [TThreadPoolServer WorkerProcess-28] 
thrift.ProcessFunction: Internal error processing get_partitions_by_names_req
org.apache.hadoop.hive.metastore.api.InvalidObjectException: 
hive.test_alter_table_hdfs_caching_95073aa2.i1670e_alter table not found
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTable(HiveMetaStore.java:9709)
 ~[hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTable(HiveMetaStore.java:9702)
 ~[hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_names(HiveMetaStore.java:7827)
 ~[hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_names_req(HiveMetaStore.java:7791)
 ~[hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) ~[?:?]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_292]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
 ~[hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
 ~[hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at com.sun.proxy.$Proxy34.get_partitions_by_names_req(Unknown Source) 
~[?:?]
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:21348)
 ~[hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:21327)
 ~[hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
[libthrift-0.16.0.jar:0.16.0]
        at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
 [hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
 [hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_292]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_292]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
 [hadoop-common-3.1.1.7.2.16.0-233.jar:?]
        at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
 [hive-standalone-metastore-3.1.3000.7.2.16.0-233.jar:3.1.3000.7.2.16.0-233]
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:250)
 [libthrift-0.16.0.jar:0.16.0]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_292]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_292]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{noformat}
Somehow the InvalidObjectException is not passed through to the HMS client. 
Maybe due to it's not declared in the method:
https://github.com/apache/hive/blob/43b1525f96bece0bf46dfab8e73d0e1b51093555/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L2722-L2726
{code:thrift}
  // get partitions give a list of partition names
  list<Partition> get_partitions_by_names(1:string db_name 2:string tbl_name 
3:list<string> names)
                       throws(1:MetaException o1, 2:NoSuchObjectException o2)
  GetPartitionsByNamesResult 
get_partitions_by_names_req(1:GetPartitionsByNamesRequest req)
                        throws(1:MetaException o1, 2:NoSuchObjectException o2)
{code}

> EventProcessor should ignore failures on non-existing dbs/tables
> ----------------------------------------------------------------
>
>                 Key: IMPALA-11768
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11768
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: 
> catalogd.84223c9b5251.invalid-user.log.INFO.20221129-132600.1.gz, 
> hive-metastore.log.gz, hive-server2.log.gz, 
> impalad.b99660bbd523.invalid-user.log.INFO.20221129-132601.1.bz2
>
>
> We recently saw test failures in 
> [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/6697/]
> {noformat}
> metadata.test_event_processing.TestEventProcessing.test_event_based_replication
> metadata.test_event_processing.TestEventProcessing.test_empty_partition_events
> metadata.test_event_processing.TestEventProcessing.test_transactional_insert_events
> metadata.test_event_processing.TestEventProcessing.test_insert_events
> metadata.test_event_processing.TestEventProcessing.test_empty_partition_events_transactional
> {noformat}
> The errors are similar:
> {code:python}
> metadata/test_event_processing.py:179: in test_event_based_replication
>     self.__run_event_based_replication_tests()
> metadata/test_event_processing.py:194: in __run_event_based_replication_tests
>     EventProcessorUtils.wait_for_event_processing(self)
> util/event_processor_utils.py:65: in wait_for_event_processing
>     within {1} seconds".format(current_event_id, timeout))
> E   Exception: Event processor did not sync till last known event id 21676    
>      within 10 seconds
> {code}
> Stderr:
> {noformat}
> SET 
> client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_event_based_replication;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2022-11-29 14:10:14,332 INFO     MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2022-11-29 14:10:14,346 INFO     MainThread: Closing active operation
> -- 2022-11-29 14:10:19,364 INFO     MainThread: Waiting until events 
> processor syncs to event id:21676
> {noformat}
> Looking into the catalogd logs, the event-processor is already in the 
> NEEDS_INVALIDATE state so it's paused:
> {noformat}
> W1129 14:10:09.401335    92 MetastoreEventsProcessor.java:844] Event 
> processing is skipped since status is NEEDS_INVALIDATE. Last synced event id 
> is 21538
> {noformat}
> The last event it processed is an INSERT event. The metadata reloading failed:
> {noformat}
> I1129 14:10:07.365427    92 MetastoreEventsProcessor.java:805] Received 36 
> events. Start event id : 21538
> I1129 14:10:07.367424    92 MetastoreEvents.java:631] EventId: 21552 
> EventType: DROP_DATABASE Creating event 21552 of type DROP_DATABASE on 
> database test_views_ddl_b1c33615
> I1129 14:10:07.371448    92 MetastoreEvents.java:631] EventId: 21573 
> EventType: DROP_DATABASE Creating event 21573 of type DROP_DATABASE on 
> database test_compute_stats_9c95c5d8
> I1129 14:10:07.371696    92 MetastoreEvents.java:281] Total number of events 
> received: 36 Total number of events filtered out: 0
> I1129 14:10:07.371767    92 MetastoreEvents.java:285] Incremented skipped 
> metric to 2631
> I1129 14:10:07.371951    92 HdfsTable.java:2744] Reloading partition 
> metadata: test_alter_table_hdfs_caching_95073aa2.i1670e_alter x=1 (INSERT 
> event)
> W1129 14:10:07.381459    92 RetryingMetaStoreClient.java:269] MetaStoreClient 
> lost connection. Attempting to reconnect (1 of 1) after 1s. 
> getPartitionsByNames
> Java exception follows:
> org.apache.thrift.TApplicationException: Internal error processing 
> get_partitions_by_names_req
> ...
> {noformat}
> It then fails in the retry, and needs manual invalidate metadata:
> {noformat}
> I1129 14:10:08.381718    92 RetryingMetaStoreClient.java:185] 
> RetryingMetaStoreClient trying reconnect as ubuntu (auth:SIMPLE)
> ...
> I1129 14:10:08.395550    92 HiveMetaStoreClient.java:821] Connected to 
> metastore.
> I1129 14:10:08.400802    92 MetastoreEventsProcessor.java:1026] Time elapsed 
> in processing event batch: 1s035ms
> E1129 14:10:08.401016    92 MetastoreEventsProcessor.java:859] Event 
> processing needs a invalidate command to resolve the state
> Java exception follows:
> org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException:
>  EventId: 21539 EventType: INSERT Refresh partition on table {} partition {} 
> failed. Event processing cannot continue. Issue an invalidate metadata 
> command to reset the event processor state.
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.processPartitionInserts(MetastoreEvents.java:1298)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.process(MetastoreEvents.java:1277)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:514)
>         at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1013)
>         at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:851)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.impala.catalog.CatalogException: Unexpected error while 
> retrieving partitions for table 
> test_alter_table_hdfs_caching_95073aa2.i1670e_alter
>         at 
> org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames(HdfsTable.java:2768)
>         at 
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExist(CatalogOpExecutor.java:4501)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions(MetastoreEvents.java:879)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.processPartitionInserts(MetastoreEvents.java:1295)
>         ... 11 more
> Caused by: org.apache.thrift.TApplicationException: Internal error processing 
> get_partitions_by_names_req
>         at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_names_req(ThriftHiveMetastore.java:4162)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_names_req(ThriftHiveMetastore.java:4149)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNamesInternal(HiveMetaStoreClient.java:2500)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(HiveMetaStoreClient.java:2491)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(HiveMetaStoreClient.java:2468)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(HiveMetaStoreClient.java:2445)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(HiveMetaStoreClient.java:2409)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>         at com.sun.proxy.$Proxy11.getPartitionsByNames(Unknown Source)
>         at 
> org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames(HdfsTable.java:2749)
>         ... 14 more
> {noformat}
> In HMS logs, the internal error is that the table is dropped:
> {noformat}
> 2022-11-29T14:10:08,398 ERROR [TThreadPoolServer WorkerProcess-28] 
> metastore.RetryingHMSHandler: 
> InvalidObjectException(message:hive.test_alter_table_hdfs_caching_95073aa2.i1670e_alter
>  table not found)
> ...
> 2022-11-29T14:10:08,398 ERROR [TThreadPoolServer WorkerProcess-28] 
> thrift.ProcessFunction: Internal error processing get_partitions_by_names_req
> org.apache.hadoop.hive.metastore.api.InvalidObjectException: 
> hive.test_alter_table_hdfs_caching_95073aa2.i1670e_alter table not found
> ...
> {noformat}
> It's dropped by a concurrent DROP DATABASE CASCADE command. In the catalogd 
> logs:
> {noformat}
> I1129 14:10:07.005285   104 JniCatalog.java:257] execDdl request: 
> DROP_DATABASE test_alter_table_hdfs_caching_95073aa2 issued by ubuntu
> ...
> I1129 14:10:07.387348   104 CatalogServiceCatalog.java:3290] Should wait for 
> next update for DATABASE:test_alter_table_hdfs_caching_95073aa2: older 
> version 4077 is sent. Expects a version >= 4727.
> {noformat}
> When failed to process the event, the event-processor should doulbe check 
> whether the target (db/table) still exists in HMS in case of such concurrent 
> drop operations. If the target no longer exists, just skip the failure and 
> invalidate the corresponding catalog object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-11768) EventProcessor should ignore failures on non-existing dbs/tables

Reply via email to