izveigor commented on issue #7580: URL: https://github.com/apache/arrow-datafusion/issues/7580#issuecomment-1725264116
Hello, @crepererum! I'll try to explain as best I can. ### About the categories: There is a finite set of data types (which we denote by <b>DataType</b>) Any of the possible types (options) of `Signature` can be described into 4 categories: <br/> `definite` (def) and `undefined` (undef); <br/> <br/><b>Definition:</b> a definite data type means that only a specific data type from the `DataType` set is suitable. (Example `Exact(`DataType::Int8`)` accepts `(`DataType::Int8`)`, but not (`DataType::UInt8`)). <br/><br/><b>Definition:</b> An undefined data type means that any data type from the set of DataTypes is suitable. (Example: `Any()` accepts (`DataType::Int8`), and also (`DataType::UInt8`)). <br/><br/> `equal` (eq) and `unequal` (uneq); <br/><br/><b>Definition of Equality:</b> the category of equality means that all elements are equal to each other. (Example: `VariadicEqual()` can accept `(DataType::Int8, DataType::Int8)` or `(DataType::UInt8, DataType::UInt8, DataType::UInt8)`, but not `(DataType::Int8, DataType::UInt8)`). <br/><br/><b>Definition of Inequality:</b> The category of inequality means that the elements can be arbitrary. (Example: `VariadicAny()` can accept `(DataType::Int8, DataType::Int8)`, and also `(DataType::Int8, DataType::UInt8)`) <br/><br/>Now, combine categories: - Definite-Equality (means that it should be a specific equal data type (sure, it should be only 1)); - Definite-Unequality (means that it should be a specific set of data types); - Undefinite-Equality (means that it can accept any equal data type)); - Undefinite-Unequality (means that it can accept any data types from <b>DataType</b>); ### About Kleene algebra: I choosed Kleene algebra (which is used for regular expressions). So if we create the algorithm for `Signature` (i. e. for boolean function, which can accept either input data set or not). For out case it is sufficient to apply only regular language (like can accept Deterministic finite automaton (DFA)). So, each type of signature represents a seperate DFA. ### About meta algebra: As exist situations, which a signature can check not only by one DFA, but many. So, we create the same regular meta language. For example, if we want to use two DFAs, and if one of it returns the positive answer, than input data set suits us (`OneOf` case). For full compatibility, it is worth adding two new signature types (`Equal` и `Concat`). <br/> Separately, it is worth mentioning the function `Concat`. `Concat` can accept only input data set (without Kleene star (only `Equal`, `Any`, `Uniform` and `Exact`)). `Concat` takes a set of data, divides them according to the size of each type of signature and uses the already specific DFA. English is not the author’s native language, so there may be some difficulties in understanding. I hope you understood me and my idea seemed reasonable to you :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
