Hi Gian, I have not experienced this problem in our cluster. I made some investigation on this problem based on issues from other people.
Yes they "solved" this problem by lowering the `recentlyFinishedThreshold` as you said. Gian Merlino <g...@apache.org> 于2021年5月20日周四 下午1:04写道: > Hey Frank, > > These notes are really interesting. Thanks for writing them down. > > I agree that the three things you laid out are all important. With regard > to SQL clauses from the web console, I did notice one recent change went in > that changed the SQL clauses to only query sys.segments for columns that > are actually visible, part of https://github.com/apache/druid/pull/10909. > That isn't very useful right now, since there isn't projection pushdown. > But if we add it, this will limit JSON serialization to only the fields > that are actually requested, which will be useful if not all of them are > requested by default. Switching to use OFFSET / LIMIT for tasks too would > also be good (or even just LIMIT would be a good start). > > Out of curiosity how many tasks do you typically have in your sys.tasks > table? > > Side note: I'm not sure if you looked into > druid.indexer.storage.recentlyFinishedThreshold, but that might be useful > as a workaround for you until some of these changes are made. You can set > it lower and it will reduce the number of complete tasks that the APIs > return. > > On Tue, May 18, 2021 at 8:13 AM Chen Frank <frank.chen...@outlook.com> > wrote: > > > Hi Jason > > > > I have tracked this problem for quite a while. Since you are interested > in > > it, I would like to share something I know with you so that you could > take > > these in consideration. > > > > In 0.19.0, there was a PR #9883 improving the performance of segments > > query by eliminating the JSON serialization. > > But PR #10752 merged in 0.21.0 brings back JSON serialization. I do not > > know whether this change reverts the performance gain in previous PR. > > > > For tasks, the performance is much worse. There are some problems > reported > > about task UI, e.g. #11042 and #11140. But I do not see any feedback on > > segment UI. > > One reason is that the web-console fetches ALL task records from broker > > and does pagination at client side instead of using a LIMIT clause in SQL > > to do pagination at server side. > > Another reason is that broker fetches ALL tasks via REST API from > overlord > > that loads records from metadata storage directly and deserializes data > > from `pay_load` field. > > > > While For segments, the two problems above do not exist because > > > > 1. LIMIT clause is used in SQL queries > > > > 2. segments query returns a snapshot in-memory segment data which > > means there is no query to metadata database and JSON deserialization of > > `pay_load` field. > > > > In 0.20, OFFSET is supported for SQL queries, I think this could also be > > added to the queries from web console which would bring some performance > > gain in some extent. > > > > IMO, to improve the performance, we might need to make changes to > > > > 1. the SQL layer you mentioned above > > > > 2. the SQL clauses from web console > > > > 3. the task REST API to support search conditions and ordering to > > narrow down the search range on metadata table > > > > Thanks. > > > > 发件人: Jason Koch <jk...@netflix.com.INVALID> > > 日期: 星期六, 2021年5月15日 上午3:51 > > 收件人: dev@druid.apache.org <dev@druid.apache.org> > > 主题: Re: Push-down of operations for SystemSchema tables > > @Julian - thank you for review & confirming. > > > > Hi Clint > > > > Thank you, I appreciate the response. I have responded Inline, some > > q's, I've also written in my words as a confirmation that I understand > > ... > > > > > In the mid term, I think that some of us have been thinking that moving > > > system tables into the Druid native query engine is the way to go, and > > have > > > been working on resolving a number of hurdles that are required to make > > > this happen. One of the main motivators to do this is so that we have > > just > > > the Druid query path in the planner in the Calcite layer, and > deprecating > > > and eventually dropping the "bindable" path completely, described in > > > https://github.com/apache/druid/issues/9896. System tables would be > > pushed > > > into Druid Datasource implementations, and queries would be handled in > > the > > > native engine. Gian has even made a prototype of what this might look > > like, > > > > > > https://github.com/apache/druid/compare/master...gianm:sql-sys-table-native > > > since much of the ground work is now in place, though it takes a > > hard-line > > > approach of completely removing bindable instead of hiding it behind a > > > flag, and doesn't implement all of the system tables yet, at least last > > > time I looked at it. > > > > Looking over the changes it seems that: > > - a new VirtualDataSource is introduced, which the Druid non-sql > > processing engine can process, that can wrap an Iterable. This exposes > > lazy segment & iterable using InlineDataSource. > > - the SegmentsTable has been converted from a ScannableTable to a > > DruidTable, and a ScannableTableIterator is introduced to generate an > > iterable containing the rows; the new VirtualDataSource can be used to > > access the rows of this table. > > - finally, the Bindable convention is discarded from DruidPlanner and > > Rules. > > > > > I think there are a couple of remaining parts to resolve that would > make > > > this feasible. The first is native scan queries need support for > ordering > > > by arbitrary columns, instead of just time, so that we can retain > > > capabilities of the existing system tables. > > > > It seems you want to use the native queries to support ordering; do > > you mean here the underlying SegmentsTable, or something in the Druid > > engine? Currently, the SegmentsTable etc relies on, as you say, the > > bindable convention to provide sort. If it was a DruidTable then it > > seems that Sorting gets pushed into PartialDruidQuery->DruidQuery, > > which conceptually is able to do a sort, but as described in [1] [2] > > the ordering is not supported by the underlying druid engine [3]. > > > > This would mean that an order by, sort, limit query would not be > > supported on any of the migrated sys.* tables until Druid has a way to > > perform the sort on a ScanQuery. > > > > [1] > > > https://druid.apache.org/docs/latest/querying/scan-query.html#time-ordering > > [2] > > > https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java#L1075-L1078 > > [3] > > > https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/scan/ScanQueryEngine.java > > > > > This isn't actually a blocker > > > for adding native system table queries, but rather a blocker for > > replacing > > > the bindable convention by default so that there isn't a loss (or > rather > > > trade) of functionality. Additionally, I think there is maybe some > > matters > > > regarding authorization of system tables when handled by the native > > engine > > > that will need resolved, but this can be done while adding the native > > > implementations. > > > > It looks like the port of the tables from classic ScannableTable to a > > DruidTable itself is straightforward. However, it seems this PR > > doesn't bring them across from SQL domain to be available in any > > native queries. I'm not sure if this is expected or an interim step or > > if I have misunderstood the goal. > > > > > I think there are some various ideas and experiments underway of how to > > do > > > sorting on scan queries at normal Druid datasource scale, which is sort > > of > > > a big project, but in the short term we might be able to do something > > less > > > ambitious that works well enough at system tables scale to allow this > > plan > > > to fully proceed. > > > > One possible way, that I think leads in the correct direction: > > 1) We have an existing rule for LogicalTable with DruidTable to > > DruidQueryRel which can eventually construct a DruidQuery. > > 2) The VirtualDataSource, created during SQL parsing takes an > > already-constructed Iterable; so, we need to have already performed > > the filter/sort before creating the VirtualDataSource (and > > DruidQuery). This means the push-down filter logic has to happen > > during sql/ stage setup and before handoff to processing/ engine. > > 3) Perhaps a new VirtualDruidTable subclassing DruidTable w/ a > > RelOptRule that can identify LogicalXxx above a VirtualDruidTable and > > push down? Then, our SegmentTable and friends can expose the correct > > Iterable. This should allow us to solve the perf concerns, and would > > allow us to present a correctly constructed VirtualDataSource. Sort > > from SQL _should_ be supported (I think) as the planner can push the > > sort etc down to these nodes directly. > > > > In this, the majority of the work would have had to have happened > > prior to Druid engine, in sql/, before reaching Druid and so Druid > > core doesn't actually need to know anything about these changes. > > > > On the other hand, whilst it keeps the pathway open, I'm not sure this > > does any of the actual work to make the sys.* tables available as > > native tables. If we are to try and make these into truly native > > tables, without a native sort, and remove their implementation from > > sql/, the DruidQuery in the planner would need to be configured to > > pass the ScanQuery sort to the processing engine _but only for sys.* > > tables_ and then processing engine would need to know how to find > > these tables. (I haven't explored this). As you mention, implementing > > native sort across multiple data sources seems like a more ambitious > > piece of work. > > > > As another idea, we could consider creating a bridge > > Bindable/EnumerableToDruid rule that would allow druid to embed these > > tables, and move them out of sql/ into processing/, exposed as > > Iterable/Enumerable, and make them available in queries if that is a > > goal. I'm not really sure that adds anything to the overall goals > > though. > > > > > Does this approach make sense? I don't believe Gian is actively working > > on > > > this at the moment, so I think if you're interested in moving along > this > > > approach and want to start laying the groundwork I'm happy to provide > > > guidance and help out. > > > > > > > I am interested. For my current work, I do want to keep focus on the > > sys.* performance work. If there's a way to do it and lay the > > groundwork or even get all the work done, then I am 100% for that. > > Looking at what you want to do to convert these sys.* to native > > tables, if we have a viable solution or are comfortable with my > > suggestions above I'd be happy to build it out. > > > > Thanks > > Jason > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > > For additional commands, e-mail: dev-h...@druid.apache.org > > >