[GitHub] [carbondata] kevinjmh commented on a change in pull request #3444: [CARBONDATA-3581] Support page level bloom filter

GitBox Fri, 22 Nov 2019 00:39:46 -0800

kevinjmh commented on a change in pull request #3444: [CARBONDATA-3581] Support 
page level bloom filter
URL: https://github.com/apache/carbondata/pull/3444#discussion_r349481780


 ##########
 File path: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ##########
 @@ -217,6 +220,14 @@ public BitSet prunePages(RawBlockletColumnChunks 
rawBlockletColumnChunks)
           bitSet.set(i);
         }
       }
 
 Review comment:
    I re-think about this.Different to minmax, page bloom costs more(storage, 
decode) when query. Row level filter may not benefit from bloom.
   
   If we use **one filter column** and get multiple columns, once bloom says 
that page does not need to scan, nothing need to do for all columns of this 
page in direct fill case. If bloom can skip more pages, the IO benefit for 
skipped pages is `# of pages * project columns`. 
   
   As for row filter, for same query, the benefit is shrank to `# of pages * 1 
(the filter column)`. 
   
   For row level filter:
   ```
   1.original: 
   
   decode page -> for-loop checking each value of this column
   
   2.with page bloom: 
   
   read bloom chunk -> decode bloom bitmap -> check bloom
   
   if check result is false -> skip
   if check result is true ->  decode page -> for-loop checking each value of 
this column
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3444: [CARBONDATA-3581] Support page level bloom filter

Reply via email to