On 5 September 2016 at 18:36, Amos Bird <[email protected]> wrote: > > > Henry Robinson writes: > > >> Question 2, > >>>> explain select s from yy2 where year in (select year from yy where > year between 2000 and 2005); > >>>> +----------------------------------------------------------+ > >>>> | Explain String | > >>>> +----------------------------------------------------------+ > >>>> | Estimated Per-Host Requirements: Memory=16.00MB VCores=2 | > >>>> | | > >>>> | 04:EXCHANGE [UNPARTITIONED] | > >>>> | | | > >>>> | 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST] | > >>>> | | hash predicates: year = year | > >>>> | | runtime filters: RF000 <- year | > >>>> | | | > >>>> | |--03:EXCHANGE [BROADCAST] | > >>>> | | | | > >>>> | | 01:SCAN HDFS [dpp.yy] | > >>>> | | partitions=2/4 files=2 size=468B | > >>>> | | | > >>>> | 00:SCAN HDFS [dpp.yy2] | > >>>> | partitions=2/3 files=2 size=468B | > >>>> | runtime filters: RF000 -> year | > >>>> +----------------------------------------------------------+ > >> > >> How does the planner be able to tell that there gonna be 2/3 > >> partitions surviving after runtime fitering? > > > > The planner doesn't know about runtime filters' selectivity. But the > between predicate does eliminate the 1999 partition which can be statically > determined by the planner. That's what changes the number of expected > scanned partitions in the plan. > > So this example isn't related to runtime filtering. It's just a > case of predicate transitivity and static partition pruning right? >
Sorry, lost track of this in my inbox. You're right - although runtime filters are created, they won't be effective because the partition with year=1999 has already been eliminated by static filtering. I've asked Cloudera's docs team to fix this. Thanks for pointing it out!
