Filed https://issues.apache.org/jira/browse/DRILL-2192 to track this.
On Sun, Feb 8, 2015 at 3:03 PM, Jacques Nadeau <[email protected]> wrote: > I think the following makes sense (from your email) > > null -> scan-all > non-empty with star -> scan-all > non-empty w/o star -> regular / scan-some > empty -> count(*) / skip-all > > The reason I identified it in the DrillScanRel is because the planner > actually returns an empty set for the count(*) case as that actually gets > rewritten as count(0) or count(1) if I recall correctly. As such, the list > of columns needed to satisfy are the empty set. > > Agreed that most (all?) readers don't support currently. We should address > this separately and make sure the readers handle what they are given (e.g. > if they don't support skip-all, they should convert to something they do > support) rather than removing the capability in the Rel layer. > > On Sun, Feb 8, 2015 at 2:44 PM, Hanifi Gunes <[email protected]> wrote: > > > - We should treat an empty list different from a null list different from > > an all list. > > Agreed. What would each mean to DrillScanRel though? > > null -> scan-all or problem? > > non-empty with star -> scan-all > > non-empty w/o star -> regular / scan-some > > empty -> count(*) / skip-all > > > > > > I have no concrete understanding of what planner outputs as the list of > > columns in case of count(*) vs select * queries. It would be nice to > > differentiate these cases without use of "null" to mean anything special > > though. Earlier, planner -at least in DrillScanRule- used to invoke a > c'tor > > that used to pass "null" to mean "all" which led some ambiguity: is it > that > > code is broken or callee asks me to scan everything? This was one of the > > reasons for disallowing nulls. Another one was to wipe out all null > checks > > on columns from all of the readers. > > > > > > The other part of the consideration is readers. Readers don't seem to > > support skip-all queries particularly for count. Looks like the current > > JsonReader implementation(so does mongo reader) after major refactoring > > also relies on a non-empty list of columns while ensuring at least one > > column is scanned for count queries, which I doubt works as intended in > > case an empty list is used to mean skip-all. Part of the code change to > > make count(*) more efficient is to make every reader understand skip-all > > for count queries. > > > > > > Earlier we had some discussions on how to make count(*) more efficient as > > well. A good starting point could be to resolve ambiguity at the planner > > side with the help of planning experts perhaps. If reader knows for sure > > that she executing a count(*) query, she should be able to output a cheap > > count vector instead of scanning everything without enforcing schema > check > > on the count column for readers that is schema-ed. > > > > How does this sound? > > > > > > Thanks. > > -Hanifi > > > > > > On Sat, Feb 7, 2015 at 6:47 PM, Jacques Nadeau <[email protected]> > wrote: > > > > > Hey Guys, > > > > > > When reviewing our code I ran across this code: > > > > > > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java#L70 > > > > > > The problem with this code is it converts no columns into every column. > > > This means we waste a bunch of time readers where we don't need to > > project > > > any columns. An example is select count(*) which turns into select > > > count(0). In that case, we should avoid projecting any columns other > > than > > > a count column. However, because we translate no columns into every > > > column, we read way more than we should. I remember someone (maybe > > Hanifi) > > > working through some rules around not allowing a blank list of columns > > > anywhere. What was the thinking and how do we correct. We should > treat > > an > > > empty list different from a null list different from an all list. > > > > > > thoughts? > > > > > >
