[ https://issues.apache.org/jira/browse/FLINK-24959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452755#comment-17452755 ]
ZhuoYu Chen edited comment on FLINK-24959 at 12/3/21, 5:12 AM: --------------------------------------------------------------- hi [~jark] clickhouse,starRocks,doris all support bitmap Traditional Count distinct calculation, due to the need for multiple data shuffle (transferring data between different nodes and calculating de-duplication) during query execution, can lead to a linear decrease in performance as the amount of data increases. Advantages of bitmap de-duplication Space advantage: The use of one bit of a bitmap to indicate the existence of the corresponding subscript has a great space advantage; for example, for int32 de-duplication, the storage space required by a normal bitmap is only 1/32 of the traditional de-duplication. With the optimized implementation of Roaring Bitmap, the storage space is further reduced significantly for sparse bitmaps. Time advantage: The computation involved in bitmap de-duplication includes bit placement for a given subscript and counting the number of bits in a bitmap, which is an O(1) operation and an O(n) operation, respectively, and the latter can be efficiently computed using clz, ctz, and other instructions. was (Author: monster#12): hi [~jark] clickhouse,starRocks,doris all support bitmap Traditional Count distinct calculation, due to the need for multiple data shuffle (transferring data between different nodes and calculating de-duplication) during query execution, can lead to a linear decrease in performance as the amount of data increases. Advantages of bitmap de-duplication Space advantage: The use of one bit of a bitmap to indicate the existence of the corresponding subscript has a great space advantage; for example, for int32 de-duplication, the storage space required by a normal bitmap is only 1/32 of the traditional de-duplication. With the optimized implementation of Roaring Bitmap, the storage space is further reduced significantly for sparse bitmaps. Time advantage: The computation involved in bitmap de-duplication includes bit placement for a given subscript and counting the number of bits in a bitmap, which is an O(1) operation and an O(n) operation, respectively, and the latter can be efficiently computed using clz, ctz, and other instructions. In addition, bitmap de-duplication can be accelerated in parallel in the MPP execution engine, where each computation node computes a local sub-bitmap and uses the bitor operation to merge these sub-bitmaps into the final bitmap. The bitor operation is more efficient than sort-based and hash-based de-duplication, has no conditional dependencies and data dependencies, and can be executed quantitatively. > Add a BitMap function to FlinkSQL > --------------------------------- > > Key: FLINK-24959 > URL: https://issues.apache.org/jira/browse/FLINK-24959 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API > Affects Versions: 1.15.0 > Reporter: ZhuoYu Chen > Priority: Minor > > bitmap_and :{color:#333333}Computes the intersection of two input bitmaps and > returns the new bitmap{color} > {color:#30323e}bitmap_andnot:{color:#333333}Computes the set (difference set) > that is in A but not in B.{color}{color} > {color:#30323e}{color:#333333}Bitmap functions related to join operations, > etc{color}{color} -- This message was sent by Atlassian Jira (v8.20.1#820001)