Currently we use the same API to parse and compile packages as we do for the IDE. This is proving quite heavy. I propose that we make an additional API for IDE integration that is fine grained and meta data driven. I believe that Eclipse has extensive meta data capabilities, which it stores on disk -- we should really try and leverage this. Although the core API should try and be independent, so other IDEs can leverage it.

1) Split the main document up into package, imports, globals, functions and rules -- but do not parse contents.

a. Maybe we will build up the meta-data for imports and globals at this stage, as I imagine that's easy.

b. Record, as meta data, the start and end line numbers, i.e. the range, for all sections.

2) Tokenise all expressions and blocks -- do not loose line numbers though, so will need to pad.

a. Record, as meta data, the start and end line/col numbers, i.e. the range, for all expressions and blocks.

3)       Parse each Rule

a.       Validate Conditional Element and Field Constraint structure

b. Validate Columns and Fields -- record this in meta data. So we know the dependant classes and fields for this rule.

c. Record Column and Field bindings as meta data -- as used for expressions.

d. Validate operators and the RHS value type, checks it's valid with LHS and operator.

4) Determine required declarations for each expression and record as meta data

a. If we can it might be nice to also determine dependant classes from the Imports and Globals , beyond declarations, and record as meta data.

5) Compile each expression and block, using a helper util, record the errors as meta data and then forget the compiled .class

6) Develop intelligent balanced-text recovery. So when scanning 1) and 2) we need to check for balanced text, if we detect incorrect balancing we then mark that section as invalid and find the start of the next valid section -- nothing inside those invalid sections will be parsed.

a. i.e. if we have an invalid expression, incorrect number of brackets on the LHS of the rule, we try and recover to the next valid area -- ideally this would be the next valid conditional element -- but that may be hard and it could be the RHS of the rule. Start simple, make it coarse, and intelligent fine grained can be added later.

7)       Intelligent re-parsing  for project wide changes.

a. We have class dependencies in meta-data for the various sections -- and also fields in constraints. So we can determine errors from the meta data, without having to reparse.

b. In the case of expressions we can use the meta data, to avoid re-compiling. I guess if the changes are too dramatic then we can recompile the expression/consequence. We have the ranges in meta data for the dependant sections, so we can rescan to suck up the expression, without having to parse the entire document

c. In document editing we only recompile expressions consequence if the user edits them -- again we should know that we are in an expression, we know the start so we scan from the start of t he expression/block to the end and compile. Avoid reparse the entire document or even the rule. If bindings are changed we can also determine the dependant expressions and recompile to get errors.

I'm sure there is a lot more complexity to this and missed stages. But it should be enough to show you the direction I want to take this. So we are extensively using meta data to minimise the re-parsing and re-compiling. We are also using meta data to localise the areas that we do need to re-parse for changes. The key to this is always being able to know exactly what we are editing. We may have to extend the descr, or create a new structure to handle meta data driven parsing. Non of this replaces the existing ruleparse/descry/packagebuidler implementations which are still required for compiling and deploying real rule bases - although hopefully we can leverage parts from both, to avoid duplication.

It may be worth taking my previous email and this one and putting into jira.
Mark

------------------------------------------------------------------------

*From:* Mark Proctor
*Sent:* 10 May 2006 14:46
*To:* '[EMAIL PROTECTED]'; 'Kris Verlaenen'; Michael Neale
*Subject:* RE: going great with antlr 3

I've been thinking more about the parser and I think there is a lot more we can do, with regards to iterative building and intelligent compiling. Initially we decided to push as much onto the rule builder -- I know believe the reverse is true. In fact I don't think the Eclipse parser should probably use the Rule Builder API at all -- instead building its own meta data. This might mean we need two parsers -- one for building Descr for PackageBuilder and another for IDE integration.

-Antlr Grammer knows the full descr structure and we can ensure correct AST trees (except the contents of expressions and blocks) using antlr -- without having to build the Descr structure.

-The parser maintains its own valid import entries, it can reuse the PackageBuilder TypeResolver here.

-We can identify when we are dealing with Columns and their fields, give the parser the ability know valid classes and field -- as mentioned previously.

-We could probably even have a helper class to identify valid operators on fields.

-Cache the line starts of important parts - Package, Rule, attributes, LHS and RHS. So we don't have to parse the entire rule, just from the current parseable section to the end.

-Cache meta data of Rules -- I think this should probably just be used Classes, to help when recompiling entire eclipse projects - so we only re/parse dependant packages/rules.

The above should allow us to deal with the bulk of user DRL editing, without having to create a complete package Descr or do any compiling. If we introduce a functions only for code blocks, that too can be handled with antlr grammer. Further to this we still need to handle expressions and blocks -- for any language. For the IDE I don't believe using PackageBuilder is efficient -- we don't need the entire compiled structure -- each expression and block is fully independent, it only needs required declarations. If we implement the above we should know which expressions or blocks we are currently editing. Further to this the Antlr Grammer and the helper TypeResolver can resolve bindings and cache in the meta data. Finally we can still use Antr, this should be pluggeable to support other languages, to examine expression to determine required declarations, cached in meta data -- we can use this to compile the expression to extract error messages/validity. This compilation is a one of, via a helper utility, just to feed back the error messages to the parser -- once its compiled it can forget it.

This also needs to be combined with a way to make expressions and blocks fool proof -- we might need to combine with a pre-processor if ANTLR cannot be made to handle this -- were we tokenise expression and blocks.

Mark

Reply via email to