[
https://issues.apache.org/jira/browse/FLINK-24959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452755#comment-17452755
]
ZhuoYu Chen commented on FLINK-24959:
-------------------------------------
hi [~jark] clickhouse,starRocks,doris all support bitmap
Traditional Count distinct calculation, due to the need for multiple data
shuffle (transferring data between different nodes and calculating
de-duplication) during query execution, can lead to a linear decrease in
performance as the amount of data increases.
Advantages of bitmap de-duplication
Space advantage: The use of one bit of a bitmap to indicate the existence
of the corresponding subscript has a great space advantage; for example, for
int32 de-duplication, the storage space required by a normal bitmap is only
1/32 of the traditional de-duplication. With the optimized implementation of
Roaring Bitmap, the storage space is further reduced significantly for sparse
bitmaps.
Time advantage: The computation involved in bitmap de-duplication includes
bit placement for a given subscript and counting the number of bits in a
bitmap, which is an O(1) operation and an O(n) operation, respectively, and the
latter can be efficiently computed using clz, ctz, and other instructions. In
addition, bitmap de-duplication can be accelerated in parallel in the MPP
execution engine, where each computation node computes a local sub-bitmap and
uses the bitor operation to merge these sub-bitmaps into the final bitmap. The
bitor operation is more efficient than sort-based and hash-based
de-duplication, has no conditional dependencies and data dependencies, and can
be executed quantitatively.
> Add a BitMap function to FlinkSQL
> ---------------------------------
>
> Key: FLINK-24959
> URL: https://issues.apache.org/jira/browse/FLINK-24959
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / API
> Affects Versions: 1.15.0
> Reporter: ZhuoYu Chen
> Priority: Minor
>
> bitmap_and :{color:#333333}Computes the intersection of two input bitmaps and
> returns the new bitmap{color}
> {color:#30323e}bitmap_andnot:{color:#333333}Computes the set (difference set)
> that is in A but not in B.{color}{color}
> {color:#30323e}{color:#333333}Bitmap functions related to join operations,
> etc{color}{color}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)