The FreeMarker pre-processing is done using a few maven macros. Drill pioneered this approach and Phoenix was able to apply the same code. If you follow that I did for PHOENIX-1706 [1] [2] it should be straightforward. Note that PhoenixPrepare overrides the method that creates the parser factory.
Julian [1] https://issues.apache.org/jira/browse/PHOENIX-1706 [2] https://github.com/julianhyde/phoenix/commit/1c0a989ecbd9448709e5682d78a7dab3a9332547 On Fri, Dec 11, 2015 at 9:20 AM, Stephan Ewen <[email protected]> wrote: > Hi Julian! > > Thanks for the explanation. We'll definitely go with embedding Calcite for > now. > > As long as DAGs are supported it is fine is the costs are double counted. > In most cases, that probably affects only the absolute cost estimate, and > the relative ranking among plans is still good. > > We'll look into FreeMaker for grammar pre-processing and come back when we > need help. > > Greetings, > Stephan > > > On Thu, Dec 10, 2015 at 10:29 AM, Julian Hyde <[email protected]> wrote: > >> Stephan, >> >> I’m pleased and flattered to hear that you are considering Calcite. Flink >> is an excellent project and we would be excited to work with you folks. >> >> Your plan to translate both table API and SQL to Calcite operator tree >> makes sense. For the table API, you could take a look at RelBuilder: it is >> a concise and user-friendly way of creating operator trees. >> >> We haven’t done much work on DAGs to date. You can definitely represent a >> DAG as an operator tree — because Calcite’s optimizer uses dynamic >> programming, even a query as simple as ‘select * from t, t’ will create a >> DAG, with both sides of the join using the same table scan — but the >> problem is that the cost will be double-counted: Calcite will cost the plan >> as if each path through the plan was evaluated from scratch. We have some >> ideas how to fix that, such as a Spool operator [1] that creates temporary >> tables that, once populated, can be used multiple times within the plan. I >> also think approaches based on integer linear programming [2] are promising. >> >> Regarding extending the SQL grammar. The Drill project put in place a >> mechanism that uses FreeMarker as a pre-processor for the grammar source >> file. A project can contribute a patch to Calcite that inserts a macro into >> Calcite's grammar file and defines a default value for that macro; its own >> parser would substitute different values in those places. Phoenix has used >> the same mechanism. Flink could use this same mechanism to extend Calcite >> SQL any way it chooses. >> >> Having said that, we are open to including your extensions in Calcite’s >> core SQL. This is especially true of streaming SQL, where we are trying to >> support at least the common cases (tumbling, hopping, sliding windows). In >> my opinion, the SQL to evaluate a hopping window on streams in Flink, Storm >> or Samza, or on a historical table in Drill or Phoenix, should be the same. >> Maybe you have novel operators that start off as Flink-specific extensions >> and eventually migrate into core SQL. >> >> One other thing I’d like to mention. I have been thinking for a while of >> introducing an “engine” SPI into Calcite, an engine being a system that has >> implementations of all core relational operators and a runtime (perhaps >> distributed) to execute them. Currently the only way for a system such as >> Flink to use Calcite is to embed it. Flink starts up first, and invokes >> Calcite for parsing and planning. But for some uses, you want to start >> Calcite first (perhaps embedded) and Calcite is configured to use Flink for >> all operators that cannot be pushed down to their data source. >> >> I think now would be a good time to introduce the engine SPI, with initial >> implementations for Flink, Drill and perhaps Spark. To be clear, the main >> way for Drill and (I presume) Flink to use Calcite would be by embedding, >> but the engine SPI would give our users more configuration flexibility. It >> would also allow us to better test Calcite-Flink and Calcite-Drill >> integration before we release. >> >> Julian >> >> [1] https://issues.apache.org/jira/browse/CALCITE-481 >> >> [2] http://www.vldb.org/pvldb/vol7/p217-karanasos.pdf >> >> >> > On Dec 9, 2015, at 7:02 AM, Stephan Ewen <[email protected]> wrote: >> > >> > Hi Calcite Folks! >> > >> > The Apache Flink community is currently looking into how to use Calcite >> for >> > optimization of both batch and streaming programs. >> > >> > We are looking to compile two different kinds of higher level APIs via >> > Calcite to Flink's APIs: >> > - Table API (a LINQ-style DSL) >> > - SQL >> > >> > Our current thought is to use largely the same translation paths for >> both, >> > with different entry points into Calcite: >> > - the Table API creates a Calcite operator tree directly >> > - the SQL interface goes through the full stack, including parser, ... >> > >> > >> > From what I have seen so far in Calcite, it looks pretty promising, with >> > its configurable and extensible rule set, and the pluggable >> schema/metadata >> > providers. >> > >> > A few questions remain for us, to see how feasible this is: >> > >> > 1) Are DAG programs supported? The table API produces operator DAGs, >> rather >> > than pure trees. Do DAGs affect/limit the space explored by the optimizer >> > engine? >> > >> > 2) For streaming programs, we will probably want to add some custom >> syntax, >> > specific to Flink's windows. Is it possible to also customize the SQL >> > dialect of the parser? >> > >> > These answers are quite crucial for us figure out how to best use Calcite >> > in our designs. Thanks for helping us... >> > >> > Greetings, >> > Stephan >> >>
