pjmore opened a new pull request #1066:
URL: https://github.com/apache/arrow-datafusion/pull/1066
Related to #440
Looking on some feedback on a Tokomak based optimizer. I've added support
for nearly all expressions as well as a number of simplification rules and
constant folding rules. This is very much a work in progress as a number of the
matching and conversion functions are pretty ugly and the expression parsing is
pretty bad at the moment. I mostly wanted to get input on a couple things.
A bunch of these optimizations require the ability to execute expressions
on literal values. I think the best way to allow external evaluation without
implementing everything twice is for the physical expressions that it makes
sense to execute on literal expressions to have the core logic for evaluation
split out into an associated function.
The main risk I see with implementing it this way is that some physical
expressions like BinaryExpr insert new type casting expressions in to the tree
when they are constructed and it would be easy to miss updating the
corresponding code in the optimizer.
I also added function volatility categories that are copy pasted from
Postgres's definition, which has three categories: immutable, stable, and
volatile. I'm not sure if datafusion needs a stable category though as that is
for functions which will return the same values for the same arguments within a
transaction.
Work that needs to be done before I'd consider this ready to even consider
merging:
- Clean up functions in utils
- Determine how configurable the optimizer should be? Should users be able
to register their own rewrite rules? Or just select which optimization
categories to run or neither?
- Clean up how expressions are parsed. Currently there is some pretty hacky
code so that the udf in the expression
``` (call udf[pow] (list 1 2))```
is parsed as a call to a udf and not a call expressions with a string as
the second argument. While this isn't hugely important if the optimizer doesn't
allow user defined rewrite rules it can make writing patterns involving strings
and udfs brittle.
- More tests
- Determine if some classes like TokomakScalar and TokomakDatatype can be
removed in favor of modifying their arrow/datafusion equivalents.
- Change the datatypes involving time to avoid using brackets. I haven't
tested this but I'm fairly certain that that would break how s expressions are
parsed by egg.
- Add support for expressions calling windowed functions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]