Re: Push-down of operations for SystemSchema tables

Clint Wylie Thu, 13 May 2021 18:20:46 -0700

Hi Jason,

Thanks for thinking about this, I would agree that the system tables are a
pain point, especially the segments table in larger clusters.

In the mid term, I think that some of us have been thinking that moving
system tables into the Druid native query engine is the way to go, and have
been working on resolving a number of hurdles that are required to make
this happen. One of the main motivators to do this is so that we have just
the Druid query path in the planner in the Calcite layer, and deprecating
and eventually dropping the "bindable" path completely, described in
https://github.com/apache/druid/issues/9896. System tables would be pushed
into Druid Datasource implementations, and queries would be handled in the
native engine. Gian has even made a prototype of what this might look like,
https://github.com/apache/druid/compare/master...gianm:sql-sys-table-native
since much of the ground work is now in place, though it takes a hard-line
approach of completely removing bindable instead of hiding it behind a
flag, and doesn't implement all of the system tables yet, at least last
time I looked at it.

I think there are a couple of remaining parts to resolve that would make
this feasible. The first is native scan queries need support for ordering
by arbitrary columns, instead of just time, so that we can retain
capabilities of the existing system tables. This isn't actually a blocker
for adding native system table queries, but rather a blocker for replacing
the bindable convention by default so that there isn't a loss (or rather
trade) of functionality. Additionally, I think there is maybe some matters
regarding authorization of system tables when handled by the native engine
that will need resolved, but this can be done while adding the native
implementations.

I think there are some various ideas and experiments underway of how to do
sorting on scan queries at normal Druid datasource scale, which is sort of
a big project, but in the short term we might be able to do something less
ambitious that works well enough at system tables scale to allow this plan
to fully proceed.

Does this approach make sense? I don't believe Gian is actively working on
this at the moment, so I think if you're interested in moving along this
approach and want to start laying the groundwork I'm happy to provide
guidance and help out.

Cheers,
Clint

On Thu, May 13, 2021 at 4:40 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:

> Jason,
>
> > I'm new to Calcite (and Druid) so if I have some terminology
> > incorrect, please point it out.
>
> From a Calcite perspective, I can tell you that your terminology (and
> ideas) seem spot on.
>
> I can’t say whether they make sense in Druid (or are easy to achieve).
>
> Julian
>
>
> > On May 13, 2021, at 4:21 PM, Jason Koch <jk...@netflix.com.INVALID>
> wrote:
> >
> > Hi all,
> >
> > I'm looking to implement push-down for some operations in the
> > SystemSchema class, and looking for your input on the best way to
> > tackle this.
> >
> > With profiling, we have found some UI slowness related to large
> > segment counts and task counts. Inspecting the code, it seems that
> > much of the data is fully materialized before often being discarded
> > [1], which makes it a good opportunity for a pushdown optimization.
> > This would make for a more snappy UI experience for segments, tasks
> > and so on. It would also I believe address #6827 [2].
> >
> > In looking at the code I can see a couple of approaches that might be
> sensible:
> >
> > - Modify the tables to support the linq4j Queryable interface, and
> > have all inputs provided in a single pass. This would be a
> > (relatively) straightforward way of fixing this specific problem,
> > however I am not sure how extensible/reusable this is, and whether I
> > have a correct understanding of Queryable.
> >
> > - Build up a custom RelNode structure for the sys. tables, along with
> > some rules, that could perform the required Logical operations in a
> > single pass on the underlying structures. This seems that perhaps some
> > of the rules would be more reusable and more in line with existing
> > Druid query architecture, however, seems like a more complex solution.
> > I think a starting point would be to convert these tables to
> > `ProjectableFilterableTable` and then develop a RelOptRule to match a
> > LogicalSort+tables and pushing down the required additional sort
> > comparators in. On scan(), then, the sort, projection, and filter
> > details would all be available to perform a single pass.
> >
> > - Any other suggestions or pointers?
> >
> > Other thoughts:
> > - Are there any common query use cases in these schemas that you think
> > we should target as a goal for opt rules? If I know in advance then I
> > can use that to guide the work.
> > - If complexity goes up a lot, especially for the second option, it
> > might be beneficial to move the rules and configuration to a new
> > package (org.apache.druid.sql.calcite.schema.sys?).
> > - It seems that these queries are currently performed in the Bindable
> > convention which would be a little slower than the Enumerable
> > convention. Is there any appetite to switch? I did not identify any
> > negative consequences from my reading.
> >
> > I'm new to Calcite (and Druid) so if I have some terminology
> > incorrect, please point it out.
> >
> > [1]
> https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java#L292-L373
> > - for ex, a "select * from sys.segments order by date desc limit 25"
> > requires full materialization of all fields of all objects
> > (.toString()) in order to correctly sort, at which point we pick first
> > 25 rows, and then most data is not needed. Ideally we could perform a
> > sort based on underlying timestamp, and only materialize the results
> > for the first 25 discovered rows.
> > [2] https://github.com/apache/druid/issues/6827
> >
> > Thanks
> > Jason
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
>
>

Re: Push-down of operations for SystemSchema tables

Reply via email to