- We should treat an empty list different from a null list different from an all list. Agreed. What would each mean to DrillScanRel though? null -> scan-all or problem? non-empty with star -> scan-all non-empty w/o star -> regular / scan-some empty -> count(*) / skip-all
I have no concrete understanding of what planner outputs as the list of columns in case of count(*) vs select * queries. It would be nice to differentiate these cases without use of "null" to mean anything special though. Earlier, planner -at least in DrillScanRule- used to invoke a c'tor that used to pass "null" to mean "all" which led some ambiguity: is it that code is broken or callee asks me to scan everything? This was one of the reasons for disallowing nulls. Another one was to wipe out all null checks on columns from all of the readers. The other part of the consideration is readers. Readers don't seem to support skip-all queries particularly for count. Looks like the current JsonReader implementation(so does mongo reader) after major refactoring also relies on a non-empty list of columns while ensuring at least one column is scanned for count queries, which I doubt works as intended in case an empty list is used to mean skip-all. Part of the code change to make count(*) more efficient is to make every reader understand skip-all for count queries. Earlier we had some discussions on how to make count(*) more efficient as well. A good starting point could be to resolve ambiguity at the planner side with the help of planning experts perhaps. If reader knows for sure that she executing a count(*) query, she should be able to output a cheap count vector instead of scanning everything without enforcing schema check on the count column for readers that is schema-ed. How does this sound? Thanks. -Hanifi On Sat, Feb 7, 2015 at 6:47 PM, Jacques Nadeau <[email protected]> wrote: > Hey Guys, > > When reviewing our code I ran across this code: > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java#L70 > > The problem with this code is it converts no columns into every column. > This means we waste a bunch of time readers where we don't need to project > any columns. An example is select count(*) which turns into select > count(0). In that case, we should avoid projecting any columns other than > a count column. However, because we translate no columns into every > column, we read way more than we should. I remember someone (maybe Hanifi) > working through some rules around not allowing a blank list of columns > anywhere. What was the thinking and how do we correct. We should treat an > empty list different from a null list different from an all list. > > thoughts? >
