What you call “statistics” Calcite calls “metadata”. Calcite has a comprehensive system for adding a new kind of metadata (such as histograms) or a new provider for metadata (that would, say, compute a value of the Selectivity metadata for YourFilter and YourJoin).
The Table.getStatistic() method is a very simple way to inject some very simple metadata, but it does not (and is not intended to) scale to richer metadata. Take a look at BuiltInMetadata, RelMetadataQuery, and one of the built-in providers, say RelMdSelectivity. Note that it is OK to define your own metadata types outside of BuiltInMetadata. RelMetadataTest.ColType illustrates that this is possible. Other groups (Hive, Drill) are probably interested in a “Histogram” metadata type, and it would be great if we could all use the same definition of Histogram, but I suspect it would take several months for that discussion to converge on anything concrete. If you’re in a hurry, better to forge ahead and share what you come up with. Julian > On Feb 24, 2016, at 6:02 AM, Victor Giannakouris - Salalidis > <[email protected]> wrote: > > Hello, > > I am using HepPlanner with custom table classes for the catalog (extending > *AbstractTable*). In my implementation I override the getStatistic() method > in which I return a Statistic definition in which I override the > getRowCount() method. > > I added some rules to the planner in order to optimize join ordering. At > this step, it moves for example the smaller tables (such as those in which > a filter is applied) at the left (*build side*). > > My actual question is how (where) can I add my own statistics (concretely, > *histograms* for selectivity estimation) in order to perform estimates for > filters or join intermediate results. > -- > Victor Giannakouris - Salalidis > > LinkedIn: > http://gr.linkedin.com/pub/victor-giannakouris-salalidis/69/585/b23/ > Personal Page: http://gsvic.github.io
