pjmore opened a new pull request #1066:
URL: https://github.com/apache/arrow-datafusion/pull/1066


   Related to #440
   
   Looking on some feedback on a Tokomak based optimizer. I've added support 
for nearly all expressions as well as  a number of simplification rules and 
constant folding rules. This is very much a work in progress as a number of the 
matching and conversion functions are pretty ugly and the expression parsing is 
pretty bad at the moment. I mostly wanted to get input on a couple things.
   
    A bunch of these optimizations require the ability to execute expressions 
on literal values.  I think the best way to allow external evaluation without 
implementing everything twice is for the physical expressions that it makes 
sense to execute on literal expressions to have the core logic for evaluation 
split out into an associated function. 
   
   The main risk I see with implementing it this way is that some physical 
expressions like BinaryExpr insert new type casting expressions in to the tree 
when they are constructed and it would be easy to miss updating the 
corresponding code in the optimizer.
   
   I also added function volatility categories that are copy pasted from 
Postgres's definition, which has three categories: immutable, stable, and 
volatile. I'm not sure if datafusion needs a stable category  though as that is 
for functions which will return the same values for the same arguments within a 
transaction.
   
   Work that needs to be done before I'd consider this ready to even consider 
merging:
   
   - Clean up functions in utils
   - Determine how configurable the optimizer should be? Should users be able 
to register their own rewrite rules? Or just select which optimization 
categories to run or neither?
   - Clean up how expressions are parsed. Currently there is some pretty hacky 
code so that the udf in the expression 
   ``` (call udf[pow] (list 1 2))```
      is parsed as a call to a udf and not a call expressions with a string as 
the second argument. While this isn't hugely important if the optimizer doesn't 
allow user defined rewrite rules it can make writing patterns involving strings 
and udfs brittle.
   - More tests
   - Determine if some classes like TokomakScalar and TokomakDatatype can be 
removed in favor of modifying their arrow/datafusion equivalents.
   - Change the datatypes involving time to avoid using brackets. I haven't 
tested this but I'm fairly certain that that would break how s expressions are 
parsed by egg.
   - Add support for expressions calling windowed functions. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to