Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3103: More efficient Bloom Filter serialisation. ......................................................................
IMPALA-3103: More efficient Bloom Filter serialisation. TBloomFilters have a 'directory' structure that is a list of individual buckets (buckets are about 64k wide). The total size of the directory can be 1MB or even much more. That leads to a lot of buckets, and very inefficient deserialisation as each bucket has to be allocated on the heap. Instead, this patch changes the TBloomFilter representation to use one contiguous string (like the real BloomFilter does, so that it can be allocated with a single operation (and deserialized with a single copy). This reduces the amount of kernel time used when deserializing a TBloomFilter by about 20x, and also speeds up converting a TBloomFilter to a 'real' BloomFilter by about 20x as well. Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84 Reviewed-on: http://gerrit.cloudera.org:8080/2359 Reviewed-by: Marcel Kornacker <[email protected]> Tested-by: Internal Jenkins --- M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M common/thrift/ImpalaInternalService.thrift 3 files changed, 15 insertions(+), 18 deletions(-) Approvals: Marcel Kornacker: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/2359 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5237e776a197cb2696675dbbe0359e751605ed84 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-2.5.0_5.7.0 Gerrit-Owner: Henry Robinson <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker <[email protected]>
