[ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao updated KYLIN-2387:
-----------------------------
    Description: 
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency comes from
* poor serialize implementation: instead of serialize bitmap directly to 
ByteBuffer, it uses ByteArrayOutputStream as medium, which causes superfluous 
memory allocations
* poor peekLength implementation: the whole bitmap is deserialized in order to 
retrieve its serialized size
* extra deserialize cost: even if only cardinality info is needed to answer the 
query, the whole bitmap is deserialize into MutableRoaringBitmap

A new BitmapCounter is designed to solve these problems
* It comes in tow flavors, mutable and immutable, which is based on 
MutableRoaringBitmap and ImmutableRoaringBitmap correspondingly
* ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
convert it to MutableBitmapCounter when necessary
* peekLength is implemented using ImmutableRoaringBitmap, which is very fast 
since only header of roaring bitmap is examined
* It directly serializes to ByteBuffer, no intermediate buffer is allocated

  was:
We found the old BitmapCounter does not perform very well on very large bitmap. 
The inefficiency 
* 


> A new BitmapCounter with better performance
> -------------------------------------------
>
>                 Key: KYLIN-2387
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2387
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Metadata, Query Engine, Storage - HBase
>    Affects Versions: v2.0.0
>            Reporter: Dayue Gao
>            Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as medium, which causes superfluous 
> memory allocations
> * poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * extra deserialize cost: even if only cardinality info is needed to answer 
> the query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> MutableRoaringBitmap and ImmutableRoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied 
> buffer. So we always deserialize to ImmutableBitmapCounter at first, and 
> convert it to MutableBitmapCounter when necessary
> * peekLength is implemented using ImmutableRoaringBitmap, which is very fast 
> since only header of roaring bitmap is examined
> * It directly serializes to ByteBuffer, no intermediate buffer is allocated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to