izveigor commented on issue #7580:
URL: 
https://github.com/apache/arrow-datafusion/issues/7580#issuecomment-1725264116

   Hello, @crepererum!
   I'll try to explain as best I can.
   
   ### About the categories:
   There is a finite set of data types (which we denote by <b>DataType</b>)
   Any of the possible types (options) of `Signature` can be described into 4 
categories:
   <br/>
   `definite` (def) and `undefined` (undef);
   <br/>
   <br/><b>Definition:</b> a definite data type means that only a specific data 
type from the `DataType` set is suitable.
   (Example `Exact(`DataType::Int8`)` accepts `(`DataType::Int8`)`, but not 
(`DataType::UInt8`)).
   <br/><br/><b>Definition:</b> An undefined data type means that any data type 
from the set of DataTypes is suitable.
   (Example: `Any()` accepts (`DataType::Int8`), and also (`DataType::UInt8`)).
   <br/><br/>
   `equal` (eq) and `unequal` (uneq);
   <br/><br/><b>Definition of Equality:</b> the category of equality means that 
all elements are equal to each other.
   (Example: `VariadicEqual()` can accept `(DataType::Int8, DataType::Int8)` or 
`(DataType::UInt8, DataType::UInt8, DataType::UInt8)`, but not 
`(DataType::Int8, DataType::UInt8)`).
   <br/><br/><b>Definition of Inequality:</b> The category of inequality means 
that the elements can be arbitrary.
   (Example: `VariadicAny()` can accept `(DataType::Int8, DataType::Int8)`, and 
also `(DataType::Int8, DataType::UInt8)`)
   <br/><br/>Now, combine categories:
   - Definite-Equality (means that it should be a specific equal data type 
(sure, it should be only 1));
   - Definite-Unequality (means that it should be a specific set of data types);
   - Undefinite-Equality (means that it can accept any equal data type));
   - Undefinite-Unequality (means that it can accept any data types from 
<b>DataType</b>);
   
   ### About Kleene algebra:
   I choosed Kleene algebra (which is used for regular expressions). So if we 
create the algorithm for `Signature` (i. e. for boolean function, which can 
accept either input data set or not).
   For out case it is sufficient to apply only regular language (like can 
accept Deterministic finite automaton (DFA)).
   
   So, each type of signature represents a seperate DFA.
   ### About meta algebra:
   As exist situations, which a signature can check not only by one DFA, but 
many. So, we create the same regular meta language.
   For example, if we want to use two DFAs, and if one of it returns the 
positive answer, than input data set suits us (`OneOf` case).
   
   For full compatibility, it is worth adding two new signature types (`Equal` 
и `Concat`).
   <br/>
   Separately, it is worth mentioning the function `Concat`. `Concat` can 
accept only input data set (without Kleene star (only `Equal`, `Any`, `Uniform` 
and `Exact`)).
   `Concat` takes a set of data, divides them according to the size of each 
type of signature and uses the already specific DFA.
   
   English is not the author’s native language, so there may be some 
difficulties in understanding.
   I hope you understood me and my idea seemed reasonable to you :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to