On Tue, Jun 9, 2015 at 3:57 AM Gunnar Morling <gun...@hibernate.org> wrote:
What I like about the Antlr4 approach is the fact that you don't need a set > of several quite similar grammars as you'd do with the tree transformation > approach. Also using the current version of Antlr instead of 3 appears > attractive to me wrt. to bugfixes and future development of the tool. > Understand that we would all "like" to use Antlr 4 for many reasons, myself included. But it has to work for our needs. There are just so many open questions (for me) as to whether that is the case. > > Based on what I understand from your discussions on the Antlr mailing > list, I'd assume the parse tree and the external state it references to > look roughly like so (---> indicates a reference to state built up during > sub-sequential walks, maybe in some external "table", maybe stored within > the (typed) tree nodes themselves): > > [QUERY] > [SELECT] > [ATTRIBUTE_REF] ---> AttributeReference("<gen:1>", "code") > [DOT] > [DOT] > [DOT] > [IDENT, "c"] > [IDENT, "headquarters"] > [IDENT, "state"] > [IDENT, "code"] > [FROM] > [SPACE] > [SPACE_ROOT] ---> InnerJoin( InnerJoin ( PersisterRef( "c", > "com.acme.Customer" ), TableRef ( "<gen:0>", "headquarters" ) ), TableRef ( > "<gen:1>", "state" ) ) ) > [IDENT, "Customer"] > [IDENT, "c"] > > I.e. instead of transforming the tree itself, the state required for > output generation would be added as "decorators" to nodes of the original > parse tree itself. That's just the basic idea as I understand it, surely > the specific types of the decorator elements (AttributeReference, > InnerJoin etc.) may look different. During "query rendering" we'd have to > inspect the decorator state of the parse tree nodes and interpret it > accordingly. > Well, see you do something "tricky" here that is actually one of my concerns with Antlr 4 :) You mix a parse tree and a semantic tree. Specifically this part of your tree: [ATTRIBUTE_REF] ---> AttributeReference("<gen:1>", "code") [DOT] [DOT] [DOT] [IDENT, "c"] [IDENT, "headquarters"] [IDENT, "state"] [IDENT, "code"] The idea of "ATTRIBUTE_REF" is a semantic concept. The DOT-IDENT struct is your parse tree. Antlr 4 does allow mixing these based on left refactoring of the rules, *but* there is an assumption there... that the branches in such a left-refactored rule can be resolved unambiguously. I am not so sure we can do that. In simpler terms... Antlr 4 needs you to be able to apply those semantic resolutions (attributeRef versus javaLiteralRef versus oraclePackagedProcedure versus ...) up front. So take the input that produces that tree: select c.headquarters.state.code Syntactically that dot-ident structure could represent any number of things. And semantically we just simply do not have enough information. We *could* eliminate it being a javaLiteralRef if we made javaLiteralRef the highest precedence branch in the left-factored rule that produces this, but that has serious drawbacks: 1) we are checking each and every dot-ident path as a possible javaLiteralRef first, which means reflection (perf) 2) it is not a fool-proof approach. The problem is that javaLiteralRef should really have very low precedence. There are conceivably cases where the expression could resolve to either a javaLiteralRef or an attributeRef, and in those cases the resolution should be routed through attributeRef not javaLiteralRef The ultimate problem there is that we cannot possibly know much of the information we need to know for proper semantic analysis until after we have seen the FROM clause. We got around that with older Antlr versions specifically via tree-rewriting: we re-write the tree to "hoist" FROM before the other clauses. So I believe the issue of alias resolution and implicit join conversion > could be handled without tree transformations (at least conceptually, I > could not code an actual implementation out of my head right away). But > maybe there are other cases where tree transformations are more strictly > needed? > Well I just illustrated above how that is actually a problem that does need either tree transformations or at least delayed processing of the sub-tree. Also get out of your head this idea that we can encode the semantic resolution of dot-ident paths into the tree. We simply will not be able to (I believe). And I think that starts to show my reservations about Antlr 4. Basically every pass over this tree we will need to deal with [[DOT][IDENT]] as opposed to [ATTRIBUTE_REFERENCE] _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev