Henry Robinson has posted comments on this change.

Change subject: IMPALA-3499: Batch update catalog cache update.
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/3067/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1270:         LOG(ERROR) << "Error deserializing item: " << 
status.GetDetail();
don't you need a continue after this?


Line 1278: TUpdateCatalogCacheResponse resp;
Similarly, consider moving this inside loop on line 1280, and redeclaring this 
where needed later.


Line 1281: len = item.value.size();
> I think this is the same purpose? DeserializeThriftMsg needs this?
My point is that there's no relationship between the len used in line 1266 and 
the len used here (that's clear as you unconditionally overwrite len here).

It will be clearer to use a new variable here - and you can still call it len - 
to make it clear to the reader that there's no functional dependency between 
the two uses. In general, confining variables to the narrowest possible scope 
is a good rule to promote clarity.


Line 1299: update_size + len
> as long as impalad is not influenced by statestore's concatenation?
Ah, I understand the flow of the code a bit better now. I think a comment that 
explains what the various updates are would be helpful.

I still think it will be clearer to change the flow a bit to process all the 
updates, then all the removals, which will make it clearer to the reader when 
all the updates are applied. Something like:

  // Take mutex lock
  for (all updates) {
    // Build update request, flush it to frontend when gets too large
  }
  // Send last update request

  // Build deletion request
  for (all deletions) {
    // Add to deletion request
  }
  // Send deletion request unconditionally, even if empty, with the flag set to 
tell the frontend this is the last one.


Line 1354: if (update_status.ok()) {
         :       update_status = 
exec_env_->frontend()->UpdateCatalogCache(update_req, &resp);
         :     }
> so this wont change behavior for not-splitted update.
That's subtle, and hard for the reader to understand. Please consider just 
holding the lock for the duration of this method.


Line 1357: if (catalog_update_lock.owns_lock()) catalog_update_lock.unlock();
> That will, but this is faster and more explicit? I can remove this if you l
Has it been faster in any of your tests?


http://gerrit.cloudera.org:8080/#/c/3067/5/fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java:

Line 129: if (catalogObject.getType() == TCatalogObjectType.CATALOG) {
        :         newCatalogVersion = catalogObject.getCatalog_version();
        :         last_batch_update = true;
        :       }
Not clear how this works - will one of the updates *always* contain a CATALOG 
object? It would be much clearer to add a new flag to 
TUpdateCatalogCacheRequest to see whether or not this is a partial update.


-- 
To view, visit http://gerrit.cloudera.org:8080/3067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I176db25124a32944f2396ce8aafbed49cac95928
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Huaisi Xu <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Huaisi Xu <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to