Definitely agree that we shouldn't boil the ocean.  That said, I don't
think we should make RecordBatch interface changes without deliberate
design. Same for RPC protocol changes. Part of my internal struggle with
the warning patch is exactly this lack of broader design. I think this is
especially true given the drive to supports backwards compatibility.

I don't think we're talking about a massive undertaking. I'll try to write
up some thoughts later this week to get the ball rolling. Sound good?

--
Jacques Nadeau
CTO and Co-Founder, Dremio
+1 on having a framework.
OTOH, as with the warnings implementation, we might want to go ahead with a
simpler implementation while we get a more generic framework design in
place.

Jacques, do you have any preliminary thoughts on the framework?

On Tue, Dec 1, 2015 at 2:08 PM, Julian Hyde <[email protected]> wrote:

> +1 for a sideband mechanism.
>
> Sideband can also allow correlated restart of sub-queries.
>
> In sideband use cases you described, the messages ran in the opposite
> direction to the data. Would the sideband also run in the same direction
as
> the data? If so it could carry warnings, rejected rows, progress
> indications, and (for online aggregation[1]) notifications that a better
> approximate query result is available.
>
> Julian
>
> [1] https://en.wikipedia.org/wiki/Online_aggregation
>
>
>
> > On Dec 1, 2015, at 1:51 PM, Jacques Nadeau <[email protected]> wrote:
> >
> > This seems like a form of sideband communication. I think we should have
> a
> > framework for this type of thing in general rather than a one-off for
> this
> > particular need. Other forms of sideband might be small table
bloomfilter
> > generation and pushdown into hbase, separate file
assignment/partitioning
> > providers balancing/generating scanner workloads, statistics generation
> for
> > adaptive execution, etc.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Tue, Dec 1, 2015 at 11:35 AM, Hsuan Yi Chu <[email protected]>
> wrote:
> >
> >> I am trying to deal with the following scenario:
> >>
> >> A bunch of minor fragments are doing things in parallel. Each of them
> could
> >> skip some records. Since the downstream minor fragment needs to know
the
> >> sum of skipped-record-counts (in order to just display or see if the
> number
> >> exceeds the threshold) in the upstreams, each upstream minor fragment
> needs
> >> to pass this scalar with RecordBatch.
> >>
> >> Since this seems impacting the protocol of RecordBatch, I am looking
for
> >> some advice here.
> >>
> >> Thanks.
> >>
>
>

Reply via email to