- We should treat an empty list different from a null list different from
an all list.
Agreed. What would each mean to DrillScanRel though?
null -> scan-all or problem?
non-empty with star -> scan-all
non-empty w/o star -> regular / scan-some
empty -> count(*) / skip-all


I have no concrete understanding of what planner outputs as the list of
columns in case of count(*) vs select * queries. It would be nice to
differentiate these cases without use of "null" to mean anything special
though. Earlier, planner -at least in DrillScanRule- used to invoke a c'tor
that used to pass "null" to mean "all" which led some ambiguity: is it that
code is broken or callee asks me to scan everything? This was one of the
reasons for disallowing nulls. Another one was to wipe out all null checks
on columns from all of the readers.


The other part of the consideration is readers. Readers don't seem to
support skip-all queries particularly for count. Looks like the current
JsonReader implementation(so does mongo reader) after major refactoring
also relies on a non-empty list of columns while ensuring at least one
column is scanned for count queries, which I doubt works as intended in
case an empty list is used to mean skip-all. Part of the code change to
make count(*) more efficient is to make every reader understand skip-all
for count queries.


Earlier we had some discussions on how to make count(*) more efficient as
well. A good starting point could be to resolve ambiguity at the planner
side with the help of planning experts perhaps. If reader knows for sure
that she executing a count(*) query, she should be able to output a cheap
count vector instead of scanning everything without enforcing schema check
on the count column for readers that is schema-ed.

How does this sound?


Thanks.
-Hanifi


On Sat, Feb 7, 2015 at 6:47 PM, Jacques Nadeau <[email protected]> wrote:

> Hey Guys,
>
> When reviewing our code I ran across this code:
>
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java#L70
>
> The problem with this code is it converts no columns into every column.
> This means we waste a bunch of time readers where we don't need to project
> any columns.  An example is select count(*) which turns into select
> count(0).  In that case, we should avoid projecting any columns other than
> a count column.  However, because we translate no columns into every
> column, we read way more than we should.  I remember someone (maybe Hanifi)
> working through some rules around not allowing a blank list of columns
> anywhere.  What was the thinking and how do we correct.  We should treat an
> empty list different from a null list different from an all list.
>
> thoughts?
>

Reply via email to