[ 
https://issues.apache.org/jira/browse/IMPALA-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796275#comment-17796275
 ] 

ASF subversion and git services commented on IMPALA-10987:
----------------------------------------------------------

Commit 3112a0c0d17e9d3d2d79bae6e5d0dc6b4cf15eb9 in impala's branch 
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3112a0c0d ]

IMPALA-10987: Changing impala.disableHmsSync in
Hive should not break event processing

Currently we require a global invalidate to reset the events processor
if the events sync is re-enabled on a table from HMS. This patch
eliminates the need to reset the catalog cache when events sync is
re-enabled.

Implementation details: when events sync is re-enabled on table via HMS
1) If the table exists in Impala,
  a) We can just invalidate the table, if the current event is greater
than the create event id of the table, so that it is reloaded the first
time query accesses it.
  b) Otherwise we can just ignore the event.
2) If the table doesn't exist in Impala, create a Incomplete table, if
there is no entry in the event delete log for this table.

Note: If the eventSync is disabled on a table, for all subsequent table
events, ideally we should mark the table as stale if the table object
is loaded, so that it is reloaded the next time query accesses it. But,
since this approach has performance impact, the events will be ignored.

Testing:
1) manually verified few scenarios.
2) Added test case for the above scenarios.

Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Reviewed-on: http://gerrit.cloudera.org:8080/20648
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Changing impala.disableHmsSync in Hive can break event processing
> -----------------------------------------------------------------
>
>                 Key: IMPALA-10987
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10987
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Csaba Ringhofer
>            Assignee: Sai Hemanth Gantasala
>            Priority: Critical
>
> To reproduce, start Impala with event polling:
> {code}
> bin/start-impala-cluster.py --catalogd_args="--hms_event_polling_interval_s=2 
> --catalog_topic_mode=minimal" --impalad_args="--use_local_catalog=1"
> {code}
> From Hive:
> {code}
> CREATE DATABASE temp;
> CREATE EXTERNAL TABLE temp.t (i int) PARTITIONED BY (p int) 
> TBLPROPERTIES('impala.disableHmsSync'='true');
> ALTER TABLE temp.t SET TBLPROPERTIES ('impala.disableHmsSync'='false');
> {code}
> From this point event sync will be broken in Impala. It can be fixed only 
> with global INVALIDATE METADATA (or restarting catalogd)
> catalogd log will include an exception like this:
> {code}
> E1026 10:30:16.151208 22514 MetastoreEventsProcessor.java:653] Event 
> processing needs a invalidate command to resolve the state
> Java exception follows:
> org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException:
>  EventId: 15956 EventType: ALTER_TABLE Detected that event sync was tur
> ned on for the table temp.t and the table does not exist. Event processing 
> cannot be continued further. Issue a invalidate metadata command to reset
>  the event processing state
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$AlterTableEvent.process(MetastoreEvents.java:992)
>         at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:345)
>         at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:747)
>         at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:645)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> and future events will be lead to a log like this:
> {code}
> W1026 10:30:18.151962 22514 MetastoreEventsProcessor.java:638] Event 
> processing is skipped since status is NEEDS_INVALIDATE. Last synced event id 
> is 15955
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to