Re: [DISCUSS] Improving Fast Schema

Steven Phillips Thu, 05 Nov 2015 11:33:07 -0800

For (3), are you referring to the operators with extend
AbstractSingleRecordBatch? We basically only call the buildSchema() method
on blocking operators. If the operators are not blocking, we simply process
the first batch, the idea being that it should be fast enough. Are there
situation where this is not true? If we are skipping empty batches, that
could cause a delay in the schema propagation, but we can handle that case
by having special handling for the first batch.

As for (4), its really historical. We originally didn't have fast schema,
and when it was added, only the minimal code changes necessary to make it
work were done. At the time the fast schema feature was implemented, there
was just the "setup" method of the operators, which handled both
materializing the output batch as well as generating the code. It would
require additional work as well as potentially adding code complexity to
further separate the parts of setup that are needed for fast schema from
those which are not. And I'm not sure how much benefit we would get from it.

What is the motivation behind this? In other words, what sort of delays are
you currently seeing? And have you done an analysis of what is causing the
delay? I would think that code generation would cause only a minimal delay,
unless we are concerned about cutting the time for "limit 0" queries down
to just a few milliseconds.

On Thu, Nov 5, 2015 at 9:53 AM, Sudheesh Katkam <[email protected]>
wrote:

> Hey y’all,
>
> @Jacques and @Steven,
>
> I am looking at improving the fast schema path (for LIMIT 0 queries). It
> seems to me that on the first call to next (the buildSchema call), in any
> operator, only two tasks need to be done:
> 1) call next exactly once on each of the incoming batches, and
> 2) setup the output container based on those incoming batches
>
> However, looking at the implementation, some record batches:
> 3) make multiple calls to incoming batches (with a comment “skip first
> batch if count is zero, as it may be an empty schema batch”),
> 4) generate code, etc.
>
> Any reason why (1) and (2) aren’t sufficient? Any optimizations that were
> considered, but not implemented?
>
> Thank you,
> Sudheesh

Re: [DISCUSS] Improving Fast Schema

Reply via email to