Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/2359

Change subject: IMPALA-3103: More efficient Bloom Filter serialisation.
......................................................................

IMPALA-3103: More efficient Bloom Filter serialisation.

TBloomFilters have a 'directory' structure that is a list of individual
buckets (buckets are about 64k wide). The total size of the directory
can be 1MB or even much more. That leads to a lot of buckets, and very
inefficient deserialisation as each bucket has to be allocated on the
heap.

Instead, this patch changes the TBloomFilter representation to use one
contiguous string (like the real BloomFilter does, so that it can be
allocated with a single operation (and deserialized with a single copy).

This reduces the amount of kernel time used when deserializing a
TBloomFilter by about 20x, and also speeds up converting a TBloomFilter
to a 'real' BloomFilter by about 20x as well.

Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84
---
M be/src/util/bloom-filter.cc
M common/thrift/ImpalaInternalService.thrift
2 files changed, 8 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/59/2359/1
-- 
To view, visit http://gerrit.cloudera.org:8080/2359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Henry Robinson <[email protected]>

Reply via email to