Hi Weijie,

    Are there some typos in the sample query ?  Looks like the projection 
should be t2.a,t2.s,t3.d (i.e., t2 instead of t1). Also the predicate “ where 
a='1' ” makes the inner query return only a single row, which is pretty trivial.

    Assuming these changes are made, then there could be many t2 “a” values to 
be equi-joined to t3’s “a” values. 

With Bloom filters, the rows from t3 would only be “mostly filtered”; there 
still needs to be a join above to produce the final result.

If wanting to push the “whole join” down, then _either_ need to have some index 
mechanism on “t3.a” – which would work as a nested loop join (NLJ), _or_ need 
to perform another type of join down below (with all related issues, like 
memory control, spill etc).  For the NLJ, indeed the current Drill does not 
support “down flow” of data (and most storage does not have indexes), and it’ll 
take some work to implement (e.g., all operators would need to accept a next() 
call with some “data” parameter).

         Boaz 
--------------------------------

On 9/19/17, 8:45 AM, "weijie tong" <tongweijie...@gmail.com> wrote:

    All:
       This is a propose about join query tuning by pushing down the join
    condition. Welcome suggestion ,discussion,objection .
    
       Suppose we have a join query "select t1.a,t1.s,t3.d (select a, sum(b) as
    s from t1 where a='1' group by a ) t2 join t3 on t2.a = t3.a"  .  This
    query will be transferred to a hashjoin or boradcast hashjoin (if metadata
    is accurate ). But the t3's rows will all be pulled out from the storage.
    If the t3 is a large table,the performance will be unacceptable.
    If we can first get the 'a' result set of the inner query,then we pushed
    down the result set to the right table t3's scan node. The right table's
    scan will be quickly.
    
         possible solutions :
         1. A new physical operator or  broadcast join ,hash join enhancements
    , which need to first query the left table's data, then push down the
    filtered left join condition column set to the right table stream, once
    confirmed the pushed down , works as normal join query logic.
         2. The pushed down join condition set maybe two possible formats bloom
    filters bytes  or list of strings.
         3. RecordBatch needs to support to push down 2's data down stream.
         4. SubScan needs to hold the 2's data,and wait for next real call to
    push down to the storage level query.
         5. Storage level should have an interface to indicate whether it
    supports to solve the pushed down bloom filter or list of strings.
    
         Since this violates drill's data flow direction,it seems a lot of work
    to do ,to change to implement this feature.
    

Reply via email to