[
https://issues.apache.org/jira/browse/ASTERIXDB-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329002#comment-16329002
]
ASF subversion and git services commented on ASTERIXDB-2243:
------------------------------------------------------------
Commit e54115d7f264ca102fef06578989b6285b35226a in asterixdb's branch
refs/heads/master from [~luochen01]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=e54115d ]
[ASTERIXDB-2243][STO] Fix BloomFilter size estimation
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
- Fix the bloom filter size estimation by using the
actual number of elements after bulk loading. This prevents
the bloom filter size grows larger and large under an update
heavy workloads, where most of ingested records are deleted
through merge.
Change-Id: Ib4054797d969efcfceb86f91b5321d34480e25c3
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2285
Sonar-Qube: Jenkins <[email protected]>
Reviewed-by: Michael Blow <[email protected]>
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Contrib: Jenkins <[email protected]>
> Bloomfilter size is overly calculated for update-heavy workloads
> ----------------------------------------------------------------
>
> Key: ASTERIXDB-2243
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2243
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: STO - Storage
> Reporter: Chen Luo
> Assignee: Chen Luo
> Priority: Major
>
> The current bloom filter calculation assumes the data is append-only without
> updates. Each bloom filter maintains the number of elements. When bulkload a
> new bloom filter through merge, the new size is simply the sum of all sizes.
> However, in a update-heavy workloads, even though the actual size of the
> merged disk component does not increase, the estimated bloom filter size will
> keep increasing and consume too much space.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)