Jason, > I'm new to Calcite (and Druid) so if I have some terminology > incorrect, please point it out.
From a Calcite perspective, I can tell you that your terminology (and ideas) seem spot on. I can’t say whether they make sense in Druid (or are easy to achieve). Julian > On May 13, 2021, at 4:21 PM, Jason Koch <jk...@netflix.com.INVALID> wrote: > > Hi all, > > I'm looking to implement push-down for some operations in the > SystemSchema class, and looking for your input on the best way to > tackle this. > > With profiling, we have found some UI slowness related to large > segment counts and task counts. Inspecting the code, it seems that > much of the data is fully materialized before often being discarded > [1], which makes it a good opportunity for a pushdown optimization. > This would make for a more snappy UI experience for segments, tasks > and so on. It would also I believe address #6827 [2]. > > In looking at the code I can see a couple of approaches that might be > sensible: > > - Modify the tables to support the linq4j Queryable interface, and > have all inputs provided in a single pass. This would be a > (relatively) straightforward way of fixing this specific problem, > however I am not sure how extensible/reusable this is, and whether I > have a correct understanding of Queryable. > > - Build up a custom RelNode structure for the sys. tables, along with > some rules, that could perform the required Logical operations in a > single pass on the underlying structures. This seems that perhaps some > of the rules would be more reusable and more in line with existing > Druid query architecture, however, seems like a more complex solution. > I think a starting point would be to convert these tables to > `ProjectableFilterableTable` and then develop a RelOptRule to match a > LogicalSort+tables and pushing down the required additional sort > comparators in. On scan(), then, the sort, projection, and filter > details would all be available to perform a single pass. > > - Any other suggestions or pointers? > > Other thoughts: > - Are there any common query use cases in these schemas that you think > we should target as a goal for opt rules? If I know in advance then I > can use that to guide the work. > - If complexity goes up a lot, especially for the second option, it > might be beneficial to move the rules and configuration to a new > package (org.apache.druid.sql.calcite.schema.sys?). > - It seems that these queries are currently performed in the Bindable > convention which would be a little slower than the Enumerable > convention. Is there any appetite to switch? I did not identify any > negative consequences from my reading. > > I'm new to Calcite (and Druid) so if I have some terminology > incorrect, please point it out. > > [1] > https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java#L292-L373 > - for ex, a "select * from sys.segments order by date desc limit 25" > requires full materialization of all fields of all objects > (.toString()) in order to correctly sort, at which point we pick first > 25 rows, and then most data is not needed. Ideally we could perform a > sort based on underlying timestamp, and only materialize the results > for the first 25 discovered rows. > [2] https://github.com/apache/druid/issues/6827 > > Thanks > Jason > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > For additional commands, e-mail: dev-h...@druid.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org