kevinjmh commented on a change in pull request #3444: [CARBONDATA-3581] Support
page level bloom filter
URL: https://github.com/apache/carbondata/pull/3444#discussion_r349481780
##########
File path:
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
##########
@@ -217,6 +220,14 @@ public BitSet prunePages(RawBlockletColumnChunks
rawBlockletColumnChunks)
bitSet.set(i);
}
}
Review comment:
I re-think about this.Different to minmax, page bloom costs more(storage,
decode) when query. Row level filter may not benefit from bloom.
If we use **one filter column** and get multiple columns, once bloom says
that page does not need to scan, nothing need to do for all columns of this
page in direct fill case. If bloom can skip more pages, the IO benefit for
skipped pages is `# of pages * project columns`.
As for row filter, for same query, the benefit is shrank to `# of pages * 1
(the filter column)`.
For row level filter:
```
1.original:
decode page -> for-loop checking each value of this column
2.with page bloom:
read bloom chunk -> decode bloom bitmap -> check bloom
if check result is false -> skip
if check result is true -> decode page -> for-loop checking each value of
this column
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services