[ https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dayue Gao updated KYLIN-2387: ----------------------------- Description: We found the old BitmapCounter does not perform very well on very large bitmap. The inefficiency comes from * poor serialize implementation: instead of serialize bitmap directly to ByteBuffer, it uses ByteArrayOutputStream as medium, which causes superfluous memory allocations * poor peekLength implementation: the whole bitmap is deserialized in order to retrieve its serialized size * extra deserialize cost: even if only cardinality info is needed to answer the query, the whole bitmap is deserialize into MutableRoaringBitmap A new BitmapCounter is designed to solve these problems * It comes in tow flavors, mutable and immutable, which is based on MutableRoaringBitmap and ImmutableRoaringBitmap correspondingly * ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied buffer. So we always deserialize to ImmutableBitmapCounter at first, and convert it to MutableBitmapCounter when necessary * peekLength is implemented using ImmutableRoaringBitmap, which is very fast since only header of roaring bitmap is examined * It directly serializes to ByteBuffer, no intermediate buffer is allocated was: We found the old BitmapCounter does not perform very well on very large bitmap. The inefficiency * > A new BitmapCounter with better performance > ------------------------------------------- > > Key: KYLIN-2387 > URL: https://issues.apache.org/jira/browse/KYLIN-2387 > Project: Kylin > Issue Type: Improvement > Components: Metadata, Query Engine, Storage - HBase > Affects Versions: v2.0.0 > Reporter: Dayue Gao > Assignee: Dayue Gao > > We found the old BitmapCounter does not perform very well on very large > bitmap. The inefficiency comes from > * poor serialize implementation: instead of serialize bitmap directly to > ByteBuffer, it uses ByteArrayOutputStream as medium, which causes superfluous > memory allocations > * poor peekLength implementation: the whole bitmap is deserialized in order > to retrieve its serialized size > * extra deserialize cost: even if only cardinality info is needed to answer > the query, the whole bitmap is deserialize into MutableRoaringBitmap > A new BitmapCounter is designed to solve these problems > * It comes in tow flavors, mutable and immutable, which is based on > MutableRoaringBitmap and ImmutableRoaringBitmap correspondingly > * ImmutableBitmapCounter has lower deserialize cost, it just maps to a copied > buffer. So we always deserialize to ImmutableBitmapCounter at first, and > convert it to MutableBitmapCounter when necessary > * peekLength is implemented using ImmutableRoaringBitmap, which is very fast > since only header of roaring bitmap is examined > * It directly serializes to ByteBuffer, no intermediate buffer is allocated -- This message was sent by Atlassian JIRA (v6.3.4#6332)