BTW, could you create some related issues in JIRA? Thanks xunzhang
Send from my iPhone > 在 2016年7月2日,23:19,Ming Li <[email protected]> 写道: > > Data skipping technology can extremely avoiding unnecessary IO, so it can > extremely enhance performance for IO intensive query. Including eliminating > query on unnecessary table partition according to the partition key range , > I think more options are available now: > > (1) Parquet / ORC format introduce a lightweight meta data info like > Min/Max/Bloom filter for each block, such meta data can be exploited when > predicate/filter info can be fetched before executing scan. > > However now in HAWQ, all data in parquet need to be scanned into memory > before processing predicate/filter. We don't generate the meta info when > INSERT into parquet table, the scan executor doesn't utilize the meta info > neither. Maybe some scan API need to be refactored so that we can get > predicate/filter > info before executing base relation scan. > > (2) Base on (1) technology, especially with Bloom filter, more optimizer > technology can be explored furthur. E.g. Impala implemented Runtime > filtering(*https://www.cloudera.com/documentation/enterprise/latest/topics/impala_runtime_filtering.html > <https://www.cloudera.com/documentation/enterprise/latest/topics/impala_runtime_filtering.html>* > ), which can be used at > - dynamic partition pruning > - converting join predicate to base relation predicate > > It tell the executor to wait for one moment(the interval time can be set in > guc) before executing base relation scan, if the interested values(e.g. the > column in join predicate only have very small set) arrived in time, it can > use these value to filter this scan, if doesn't arrived in time, it scan > without this filter, which doesn't impact result correctness. > > Unlike (1) technology, this technology cannot be used in any case, it only > outperform in some cases. So it just add some more query plan > choices/paths, and the optimizer need based on statistics info to calculate > the cost, and apply it when cost down. > > All in one, maybe more similar technology can be adoptable for HAWQ now, > let's start to think about performance related technology, moreover we need > to instigate how these technology can be implemented in HAWQ. > > Any ideas or suggestions are welcomed? Thanks.
