[ 
https://issues.apache.org/jira/browse/IMPALA-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013264#comment-17013264
 ] 

ASF subversion and git services commented on IMPALA-9101:
---------------------------------------------------------

Commit d46f4a68fa86ca59b9066abbfe70a5d3c8d090a3 in impala's branch 
refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d46f4a6 ]

IMPALA-9101: Add support for detecting self-events on partition events

This commit redoes some of the self-event detection logic, specifically
for the partition events. Before the patch, the self-event identifiers
for a partition were stored at a table level when generating the
partition events. This was problematic since unlike ADD_PARTITION and
DROP_PARTITION event, ALTER_PARTITION event is generated one per
partition. Due to this if there are multiple ALTER_PARTITION events
generated, only the first event is identified as a self-event and the
rest of the events are processed. This patch fixes this by adding the
self-event identifiers to each partition so that when the event is later
received, each ALTER_PARTITION uses the state stored in HdfsPartition to
valuate the self-events. The patch makes sure that the event processor
takes a table lock during self-event evaluation to avoid races with
other parts of the code which try to modify the table at the same time.

Additionally, this patch also changes the event processor to refresh a
loaded table (incomplete tables are not refreshed) when a ALTER_TABLE
event is received instead of invalidating the table. This makes the
events processor consistent with respect to all the other event types.
In future, we should add a flag to choose the behavior preference
(prefer invalidate or refresh).

Also, this patch fixes the following related issues:
1. Self-event logic was not triggered for alter database events when
user modifies the comment on the database.
2. In case of queries like "alter table add if not exists partition...",
the partition is not added since its pre-existing. The self-event
identifiers should not be added in such cases since no event is expected
from such queries.
3. Changed wait_for_event_processing test util method in
EventProcessorUtils to use a more deterministic way to determine if the
catalog updates have propogated to impalad instead of waiting for a
random duration of time.  This also speeds up the event processing tests
significantly.

Testing Done:
1. Added a e2e self-events test which runs multiple impala
queries and makes sure that the event is skips processing.
2. Ran MetastoreEventsProcessorTest
3. Ran core tests on CDH and CDP builds.

Change-Id: I9b4148f6be0f9f946c8ad8f314d64b095731744c
Reviewed-on: http://gerrit.cloudera.org:8080/14799
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Unneccessary REFRESH due to wrong self-event detection
> ------------------------------------------------------
>
>                 Key: IMPALA-9101
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9101
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Vihang Karajgaonkar
>            Priority: Critical
>
> In {{CatalogOpExecutor.alterTable()}}, we call 
> {{addVersionsForInflightEvents()}} whenever the AlterTable operation changes 
> anything or not. If nothing changes, no HMS RPCs are sent. The event 
> processor ends up waiting on a non-existed self-event. Then all self-events 
> are treated as outside events and unneccessary REFRESH/INVALIDATE on this 
> table will be performed.
> Codes:
> {code:java}
>   private void alterTable(TAlterTableParams params, TDdlExecResponse response)
>       throws ImpalaException {
>     ....
>     tryLock(tbl);
>     // Get a new catalog version to assign to the table being altered.
>     long newCatalogVersion = catalog_.incrementAndGetCatalogVersion();
>     addCatalogServiceIdentifiers(tbl, catalog_.getCatalogServiceId(), 
> newCatalogVersion);
>     ....
>       // now that HMS alter operation has succeeded, add this version to list 
> of inflight
>       // events in catalog table if event processing is enabled
>       catalog_.addVersionsForInflightEvents(tbl, newCatalogVersion);    <---- 
> We should check before calling this.
>   }
> {code}
> Reproduce:
> {code:sql}
> create table testtbl (col int) partitioned by (p1 int, p2 int);
> alter table testtbl add partition (p1=2,p2=6);
> alter table testtbl add if not exists partition (p1=2,p2=6);
> -- After this point, can't detect self-events on this table
> alter table testtbl add partition (p1=2,p2=7);
> {code}
> Catalogd logs:
> {code:bash}
> I1029 07:41:15.310956  8546 HdfsTable.java:630] Loaded file and block 
> metadata for default.testtbl partitions: p1=2/p2=6
> I1029 07:41:15.892410  8321 MetastoreEventsProcessor.java:480] Received 1 
> events. Start event id : 11463
> I1029 07:41:15.895717  8321 MetastoreEvents.java:396] EventId: 11464 
> EventType: ADD_PARTITION Creating event 11464 of type ADD_PARTITION on table 
> default.testtbl
> I1029 07:41:15.940225  8321 MetastoreEvents.java:241] Total number of events 
> received: 1 Total number of events filtered out: 0
> I1029 07:41:15.940414  8321 MetastoreEvents.java:385] EventId: 11464 
> EventType: ADD_PARTITION Not processing the event as it is a self-event
> #### Correctly recognize self-event ^^^^
> I1029 07:41:16.829824  8329 catalog-server.cc:641] Collected update: 
> 1:TABLE:default.testtbl, version=1385, original size=4438, compressed 
> size=1216
> I1029 07:41:16.831853  8329 catalog-server.cc:641] Collected update: 
> 1:CATALOG_SERVICE_ID, version=1385, original size=60, compressed size=58
> I1029 07:41:18.827137  8339 catalog-server.cc:337] A catalog update with 2 
> entries is assembled. Catalog version: 1385 Last sent catalog version: 1384
> #### No events for adding partition p1=2,p2=6 again. But we still bump the 
> catalog version.
> I1029 07:45:38.900974  8329 catalog-server.cc:641] Collected update: 
> 1:CATALOG_SERVICE_ID, version=1386, original size=60, compressed size=58
> I1029 07:45:40.899353  8339 catalog-server.cc:337] A catalog update with 1 
> entries is assembled. Catalog version: 1386 Last sent catalog version: 1385
> #### Creating partition p1=2,p2=7
> I1029 07:45:48.827221  8546 HdfsTable.java:630] Loaded file and block 
> metadata for default.testtbl partitions: p1=2/p2=7
> I1029 07:45:48.904234  8329 catalog-server.cc:641] Collected update: 
> 1:TABLE:default.testtbl, version=1387, original size=4886, compressed 
> size=1251
> I1029 07:45:48.905262  8329 catalog-server.cc:641] Collected update: 
> 1:CATALOG_SERVICE_ID, version=1387, original size=60, compressed size=58
> I1029 07:45:49.523567  8321 MetastoreEventsProcessor.java:480] Received 1 
> events. Start event id : 11464
> I1029 07:45:49.524150  8321 MetastoreEvents.java:396] EventId: 11465 
> EventType: ADD_PARTITION Creating event 11465 of type ADD_PARTITION on table 
> default.testtbl
> I1029 07:45:49.527262  8321 MetastoreEvents.java:241] Total number of events 
> received: 1 Total number of events filtered out: 0
> I1029 07:45:49.530278  8321 MetastoreEvents.java:385] EventId: 11465 
> EventType: ADD_PARTITION Trying to refresh 1 partitions added to table 
> default.testtbl in the event
> I1029 07:45:49.531026  8321 CatalogServiceCatalog.java:2572] Refreshing 
> partition metadata: default.testtbl p1=2/p2=7 (processing ADD_PARTITION event 
> from HMS)
> #### Unneccessary REFRESH ^^^^
> I1029 07:45:49.604936  8321 HdfsTable.java:630] Loaded file and block 
> metadata for default.testtbl partitions: p1=2/p2=7
> I1029 07:45:49.605069  8321 CatalogServiceCatalog.java:2594] Refreshed 
> partition metadata: default.testtbl p1=2/p2=7
> I1029 07:45:49.605273  8321 MetastoreEvents.java:385] EventId: 11465 
> EventType: ADD_PARTITION Refreshed 1 partitions of table default.testtbl
> I1029 07:45:50.901763  8339 catalog-server.cc:337] A catalog update with 2 
> entries is assembled. Catalog version: 1387 Last sent catalog version: 1386
> I1029 07:45:50.904940  8329 catalog-server.cc:641] Collected update: 
> 1:TABLE:default.testtbl, version=1388, original size=4886, compressed 
> size=1251
> I1029 07:45:50.905792  8329 catalog-server.cc:641] Collected update: 
> 1:CATALOG_SERVICE_ID, version=1388, original size=60, compressed size=58
> I1029 07:45:52.902602  8339 catalog-server.cc:337] A catalog update with 2 
> entries is assembled. Catalog version: 1388 Last sent catalog version: 1387
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to