Henry Robinson has posted comments on this change. Change subject: IMPALA-3499: Batch update catalog cache update. ......................................................................
Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/3067/5/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1270: LOG(ERROR) << "Error deserializing item: " << status.GetDetail(); don't you need a continue after this? Line 1278: TUpdateCatalogCacheResponse resp; Similarly, consider moving this inside loop on line 1280, and redeclaring this where needed later. Line 1281: len = item.value.size(); > I think this is the same purpose? DeserializeThriftMsg needs this? My point is that there's no relationship between the len used in line 1266 and the len used here (that's clear as you unconditionally overwrite len here). It will be clearer to use a new variable here - and you can still call it len - to make it clear to the reader that there's no functional dependency between the two uses. In general, confining variables to the narrowest possible scope is a good rule to promote clarity. Line 1299: update_size + len > as long as impalad is not influenced by statestore's concatenation? Ah, I understand the flow of the code a bit better now. I think a comment that explains what the various updates are would be helpful. I still think it will be clearer to change the flow a bit to process all the updates, then all the removals, which will make it clearer to the reader when all the updates are applied. Something like: // Take mutex lock for (all updates) { // Build update request, flush it to frontend when gets too large } // Send last update request // Build deletion request for (all deletions) { // Add to deletion request } // Send deletion request unconditionally, even if empty, with the flag set to tell the frontend this is the last one. Line 1354: if (update_status.ok()) { : update_status = exec_env_->frontend()->UpdateCatalogCache(update_req, &resp); : } > so this wont change behavior for not-splitted update. That's subtle, and hard for the reader to understand. Please consider just holding the lock for the duration of this method. Line 1357: if (catalog_update_lock.owns_lock()) catalog_update_lock.unlock(); > That will, but this is faster and more explicit? I can remove this if you l Has it been faster in any of your tests? http://gerrit.cloudera.org:8080/#/c/3067/5/fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java File fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java: Line 129: if (catalogObject.getType() == TCatalogObjectType.CATALOG) { : newCatalogVersion = catalogObject.getCatalog_version(); : last_batch_update = true; : } Not clear how this works - will one of the updates *always* contain a CATALOG object? It would be much clearer to add a new flag to TUpdateCatalogCacheRequest to see whether or not this is a partial update. -- To view, visit http://gerrit.cloudera.org:8080/3067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I176db25124a32944f2396ce8aafbed49cac95928 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Huaisi Xu <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Huaisi Xu <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
