Re: Push-down of operations for SystemSchema tables

frank chen Thu, 20 May 2021 00:20:04 -0700

Hi Gian,

I have not experienced this problem in our cluster. I made some
investigation on this problem based on issues from other people.


Yes they "solved" this problem by lowering the `recentlyFinishedThreshold`
as you said.


Gian Merlino <g...@apache.org> 于2021年5月20日周四 下午1:04写道：

> Hey Frank,
>
> These notes are really interesting. Thanks for writing them down.
>
> I agree that the three things you laid out are all important. With regard
> to SQL clauses from the web console, I did notice one recent change went in
> that changed the SQL clauses to only query sys.segments for columns that
> are actually visible, part of https://github.com/apache/druid/pull/10909.
> That isn't very useful right now, since there isn't projection pushdown.
> But if we add it, this will limit JSON serialization to only the fields
> that are actually requested, which will be useful if not all of them are
> requested by default. Switching to use OFFSET / LIMIT for tasks too would
> also be good (or even just LIMIT would be a good start).
>
> Out of curiosity how many tasks do you typically have in your sys.tasks
> table?
>
> Side note: I'm not sure if you looked into
> druid.indexer.storage.recentlyFinishedThreshold, but that might be useful
> as a workaround for you until some of these changes are made. You can set
> it lower and it will reduce the number of complete tasks that the APIs
> return.
>
> On Tue, May 18, 2021 at 8:13 AM Chen Frank <frank.chen...@outlook.com>
> wrote:
>
> > Hi Jason
> >
> > I have tracked this problem for quite a while. Since you are interested
> in
> > it, I would like to share something I know with you so that you could
> take
> > these in consideration.
> >
> > In 0.19.0, there was a PR #9883 improving the performance of segments
> > query by eliminating the JSON serialization.
> > But PR #10752 merged in 0.21.0 brings back JSON serialization. I do not
> > know whether this change reverts the performance gain in previous PR.
> >
> > For tasks, the performance is much worse. There are some problems
> reported
> > about task UI, e.g. #11042 and #11140. But I do not see any feedback on
> > segment UI.
> > One reason is that the web-console fetches ALL task records from broker
> > and does pagination at client side instead of using a LIMIT clause in SQL
> > to do pagination at server side.
> > Another reason is that broker fetches ALL tasks via REST API from
> overlord
> > that loads records from metadata storage directly and deserializes data
> > from `pay_load` field.
> >
> > While For segments, the two problems above do not exist because
> >
> > 1.     LIMIT clause is used in SQL queries
> >
> > 2.     segments query returns a snapshot in-memory segment data which
> > means there is no query to metadata database and JSON deserialization of
> > `pay_load` field.
> >
> > In 0.20, OFFSET is supported for SQL queries, I think this could also be
> > added to the queries from web console which would bring some performance
> > gain in some extent.
> >
> > IMO, to improve the performance, we might need to make changes to
> >
> > 1.     the SQL layer you mentioned above
> >
> > 2.     the SQL clauses from web console
> >
> > 3.     the task REST API to support search conditions and ordering to
> > narrow down the search range on metadata table
> >
> > Thanks.
> >
> > 发件人: Jason Koch <jk...@netflix.com.INVALID>
> > 日期: 星期六, 2021年5月15日 上午3:51
> > 收件人: dev@druid.apache.org <dev@druid.apache.org>
> > 主题: Re: Push-down of operations for SystemSchema tables
> > @Julian - thank you for review & confirming.
> >
> > Hi Clint
> >
> > Thank you, I appreciate the response. I have responded Inline, some
> > q's, I've also written in my words as a confirmation that I understand
> > ...
> >
> > > In the mid term, I think that some of us have been thinking that moving
> > > system tables into the Druid native query engine is the way to go, and
> > have
> > > been working on resolving a number of hurdles that are required to make
> > > this happen. One of the main motivators to do this is so that we have
> > just
> > > the Druid query path in the planner in the Calcite layer, and
> deprecating
> > > and eventually dropping the "bindable" path completely, described in
> > > https://github.com/apache/druid/issues/9896. System tables would be
> > pushed
> > > into Druid Datasource implementations, and queries would be handled in
> > the
> > > native engine. Gian has even made a prototype of what this might look
> > like,
> > >
> >
> https://github.com/apache/druid/compare/master...gianm:sql-sys-table-native
> > > since much of the ground work is now in place, though it takes a
> > hard-line
> > > approach of completely removing bindable instead of hiding it behind a
> > > flag, and doesn't implement all of the system tables yet, at least last
> > > time I looked at it.
> >
> > Looking over the changes it seems that:
> > - a new VirtualDataSource is introduced, which the Druid non-sql
> > processing engine can process, that can wrap an Iterable. This exposes
> > lazy segment & iterable using  InlineDataSource.
> > - the SegmentsTable has been converted from a ScannableTable to a
> > DruidTable, and a ScannableTableIterator is introduced to generate an
> > iterable containing the rows; the new VirtualDataSource can be used to
> > access the rows of this table.
> > - finally, the Bindable convention is discarded from DruidPlanner and
> > Rules.
> >
> > > I think there are a couple of remaining parts to resolve that would
> make
> > > this feasible. The first is native scan queries need support for
> ordering
> > > by arbitrary columns, instead of just time, so that we can retain
> > > capabilities of the existing system tables.
> >
> > It seems you want to use the native queries to support ordering; do
> > you mean here the underlying SegmentsTable, or something in the Druid
> > engine? Currently, the SegmentsTable etc relies on, as you say, the
> > bindable convention to provide sort. If it was a DruidTable then it
> > seems that Sorting gets pushed into PartialDruidQuery->DruidQuery,
> > which conceptually is able to do a sort, but as described in [1] [2]
> > the ordering is not supported by the underlying druid engine [3].
> >
> > This would mean that an order by, sort, limit query would not be
> > supported on any of the migrated sys.* tables until Druid has a way to
> > perform the sort on a ScanQuery.
> >
> > [1]
> >
> https://druid.apache.org/docs/latest/querying/scan-query.html#time-ordering
> > [2]
> >
> https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java#L1075-L1078
> > [3]
> >
> https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/scan/ScanQueryEngine.java
> >
> > > This isn't actually a blocker
> > > for adding native system table queries, but rather a blocker for
> > replacing
> > > the bindable convention by default so that there isn't a loss (or
> rather
> > > trade) of functionality. Additionally, I think there is maybe some
> > matters
> > > regarding authorization of system tables when handled by the native
> > engine
> > > that will need resolved, but this can be done while adding the native
> > > implementations.
> >
> > It looks like the port of the tables from classic ScannableTable to a
> > DruidTable itself is straightforward. However, it seems this PR
> > doesn't bring them across from SQL domain to be available in any
> > native queries. I'm not sure if this is expected or an interim step or
> > if I have misunderstood the goal.
> >
> > > I think there are some various ideas and experiments underway of how to
> > do
> > > sorting on scan queries at normal Druid datasource scale, which is sort
> > of
> > > a big project, but in the short term we might be able to do something
> > less
> > > ambitious that works well enough at system tables scale to allow this
> > plan
> > > to fully proceed.
> >
> > One possible way, that I think leads in the correct direction:
> > 1) We have an existing rule for LogicalTable with DruidTable to
> > DruidQueryRel which can eventually construct a DruidQuery.
> > 2) The VirtualDataSource, created during SQL parsing takes an
> > already-constructed Iterable; so, we need to have already performed
> > the filter/sort before creating the VirtualDataSource (and
> > DruidQuery). This means the push-down filter logic has to happen
> > during sql/ stage setup and before handoff to processing/ engine.
> > 3) Perhaps a new VirtualDruidTable subclassing DruidTable w/ a
> > RelOptRule that can identify LogicalXxx above a VirtualDruidTable and
> > push down? Then, our SegmentTable and friends can expose the correct
> > Iterable. This should allow us to solve the perf concerns, and would
> > allow us to present a correctly constructed VirtualDataSource. Sort
> > from SQL _should_ be supported (I think) as the planner can push the
> > sort etc down to these nodes directly.
> >
> > In this, the majority of the work would have had to have happened
> > prior to Druid engine, in sql/, before reaching Druid and so Druid
> > core doesn't actually need to know anything about these changes.
> >
> > On the other hand, whilst it keeps the pathway open, I'm not sure this
> > does any of the actual work to make the sys.* tables available as
> > native tables. If we are to try and make these into truly native
> > tables, without a native sort, and remove their implementation from
> > sql/, the DruidQuery in the planner would need to be configured to
> > pass the ScanQuery sort to the processing engine _but only for sys.*
> > tables_ and then processing engine would need to know how to find
> > these tables. (I haven't explored this). As you mention, implementing
> > native sort across multiple data sources seems like a more ambitious
> > piece of work.
> >
> > As another idea, we could consider creating a bridge
> > Bindable/EnumerableToDruid rule that would allow druid to embed these
> > tables, and move them out of sql/ into processing/, exposed as
> > Iterable/Enumerable, and make them available in queries if that is a
> > goal. I'm not really sure that adds anything to the overall goals
> > though.
> >
> > > Does this approach make sense? I don't believe Gian is actively working
> > on
> > > this at the moment, so I think if you're interested in moving along
> this
> > > approach and want to start laying the groundwork I'm happy to provide
> > > guidance and help out.
> > >
> >
> > I am interested. For my current work, I do want to keep focus on the
> > sys.* performance work. If there's a way to do it and lay the
> > groundwork or even get all the work done, then I am 100% for that.
> > Looking at what you want to do to convert these sys.* to native
> > tables, if we have a viable solution or are comfortable with my
> > suggestions above I'd be happy to build it out.
> >
> > Thanks
> > Jason
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
>

Re: Push-down of operations for SystemSchema tables

Reply via email to