Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/2359
Change subject: IMPALA-3103: More efficient Bloom Filter serialisation. ...................................................................... IMPALA-3103: More efficient Bloom Filter serialisation. TBloomFilters have a 'directory' structure that is a list of individual buckets (buckets are about 64k wide). The total size of the directory can be 1MB or even much more. That leads to a lot of buckets, and very inefficient deserialisation as each bucket has to be allocated on the heap. Instead, this patch changes the TBloomFilter representation to use one contiguous string (like the real BloomFilter does, so that it can be allocated with a single operation (and deserialized with a single copy). This reduces the amount of kernel time used when deserializing a TBloomFilter by about 20x, and also speeds up converting a TBloomFilter to a 'real' BloomFilter by about 20x as well. Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84 --- M be/src/util/bloom-filter.cc M common/thrift/ImpalaInternalService.thrift 2 files changed, 8 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/59/2359/1 -- To view, visit http://gerrit.cloudera.org:8080/2359 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-2.5.0_5.7.0 Gerrit-Owner: Henry Robinson <[email protected]>
