[
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dayue Gao updated KYLIN-2387:
-----------------------------
Description:
We found the old BitmapCounter does not perform very well on very large bitmap.
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer
query, the whole bitmap is deserialize into MutableRoaringBitmap
A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied
buffer. So we always deserialize to ImmutableBitmapCounter at first, and
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes,
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated
* The wire format is unchanged
was:
We found the old BitmapCounter does not perform very well on very large bitmap.
The inefficiency comes from
* Poor serialize implementation: instead of serialize bitmap directly to
ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes
superfluous memory allocations
* Poor peekLength implementation: the whole bitmap is deserialized in order to
retrieve its serialized size
* Extra deserialize cost: even if only cardinality info is needed to answer
query, the whole bitmap is deserialize into MutableRoaringBitmap
A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on
Mutable/Immutable RoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied
buffer. So we always deserialize to ImmutableBitmapCounter at first, and
convert it to MutableBitmapCounter only when necessary
* peekLength is implemented using ImmutableRoaringBitmap.serializedSizeInBytes,
which is very fast since only the header of roaring format is examined
* It can directly serializes to ByteBuffer, no intermediate buffer is allocated
> A new BitmapCounter with better performance
> -------------------------------------------
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
> Issue Type: Improvement
> Components: Metadata, Query Engine, Storage - HBase
> Affects Versions: v2.0.0
> Reporter: Dayue Gao
> Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied
> buffer. So we always deserialize to ImmutableBitmapCounter at first, and
> convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is
> allocated
> * The wire format is unchanged
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)