Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3103: More efficient Bloom Filter serialisation.
......................................................................


IMPALA-3103: More efficient Bloom Filter serialisation.

TBloomFilters have a 'directory' structure that is a list of individual
buckets (buckets are about 64k wide). The total size of the directory
can be 1MB or even much more. That leads to a lot of buckets, and very
inefficient deserialisation as each bucket has to be allocated on the
heap.

Instead, this patch changes the TBloomFilter representation to use one
contiguous string (like the real BloomFilter does, so that it can be
allocated with a single operation (and deserialized with a single copy).

This reduces the amount of kernel time used when deserializing a
TBloomFilter by about 20x, and also speeds up converting a TBloomFilter
to a 'real' BloomFilter by about 20x as well.

Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84
Reviewed-on: http://gerrit.cloudera.org:8080/2359
Reviewed-by: Marcel Kornacker <[email protected]>
Tested-by: Internal Jenkins
---
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
3 files changed, 15 insertions(+), 18 deletions(-)

Approvals:
  Marcel Kornacker: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/2359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Henry Robinson <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <[email protected]>

Reply via email to