cccs-RyanS opened a new issue #17852:
URL: https://github.com/apache/superset/issues/17852


   # [SIP-79] Proposal for Business Types
   
   ## Motivation
   
   Our users are leveraging Superset to explore and visualize network security 
logs. These logs often contain IP, CIDR range, port, hostname, and/or protocol 
fields, among others. While such fields might be stored as simple INTEGERS or 
VARCHARS in the backend database, these fields also have a “business” type 
which, if leveraged, can provide additional context and improve the user 
experience.
   
   **User-Friendly Display**
   
   A field’s business type can influence how that field is interacted with by 
the users. Consider, for example, an IP field. For performance reasons, we 
store IP values as INTEGERs in our DW e.g. the IP “127.0.0.1” would be stored 
as its integer value 2130706433. However, the string representation is 
substantially more human-readable and, as such, we wish to display IP fields in 
dot-decimal notation in visualizations and filters. This numeric format is 
fairly standard (link)and having this display functionality would be valuable 
in superset in case the underlying DB does not support this conversion 
functionality out of the box. As another example, a Port field stored as an 
integer (e.g., 443) could be displayed as a protocol name where such mapping is 
known (e.g., HTTPS).
   
   
   **Applicable Operators for filters**
   
   Certain operators (e.g., `LIKE`, `=`, `<`, `IS NULL`, `IN`, etc.) may be 
valid for a database type, but may not be applicable to a business type. For 
example, an organization might decide that the `LIKE` operator is not relevant 
to Port fields. Ideally, we want to be able to configure a list of applicable 
operators that will appear for a field of a certain business type.
   
   **Input Validation**
   
   Business types can be associated with (configurable) input validation and 
type conversion rules. For example, an input validator for a Port field could 
validate that the filter string is a parsable number between 0 and 65536, or 
that it contains known port names such as HTTPS, SSH, FTP.
   
   **Human Readable input values**
   
   As with displaying the human readable format of a value to the user (i.e the 
dot notation of an IP) users should be able to enter the more readable form of 
a value and have it be applied properly to a query. This would mean a user 
could create a filter such as `{col: "ip_col", op: "IN", val: [“1.1.1.1”, 
“1.1.1.2”]}` and have the query be executed as `ip_col IN [16842752, 
16842753]`. The functionality for a user to simply enter `{col: "ip_col", op: 
"IN", val: [16842752, 16842753]}` should also be preserved.
   
   **Behavior specified in the config**
   
   The code to specify the behavior of each business type should be defined 
outside of the superset codebase (i.e the config). This way those who wish to 
implement a business type may do so freely without having to make any code 
changes to superset.
   
   
   **Proof of concept**
   
   Figure 1: a video displaying a proof of concept for this proposal. a version 
of the modified filters popover is shown. The user inputs both the human 
readable and numeric values for ports and the conversion is shown. Also it is 
shown the user cannot save an invalid value, `abc` in this case. The user then 
saves and submits a valid filter and runs the query. the query is then 
displayed.
   
   
https://user-images.githubusercontent.com/71385290/147113657-f8d72243-2852-43c9-8f64-604aff734fda.mp4
   
   ## Proposed Changes
   
   We propose that the concept of a “business type” be introduced to dataset 
columns in Superset and that this business type be leveraged to allow display 
formats, applicable operators, and validation rules to be added to fields of a 
particular business type. Business types would be a purely optional attribute 
which could be specified on columns; if a business type is not specified for a 
particular column, there would be no changes to how a column is displayed, what 
operators are available, or what validation is performed on filter inputs.
   
   **Adding the business type to dataset**
   
   Introduce the concept of a business type on dataset columns
   - Add a business_type attribute to the BaseColumn model. This column only 
needs to be a nullable VARCHAR.
   - Add a “Business Type” field to the “Edit Datasource” modal (Figure 1) 
where dataset creators/editors can set the business type for a column. This 
should be restricted to applicable business types
   
   Figure 2: Edit Datasource modal with Business Type field
   
   
![unnamed](https://user-images.githubusercontent.com/71385290/147107085-c8d7a903-7bfd-45e9-b4e0-590107895a0f.png)
   
   
   **Allow applicable operator sets to be defined for each business type**
   
   The Explorer UI has an ad-hoc filter which lets the user enter a condition 
on a column, a value and an operator (>, =, LIKE, IS NULL, etc).
   
   Currently all operators are available to the user. We propose returning a 
list of applicable operators in the response payload. The list of applicable 
operators should be configurable for each custom business type. For maximum 
flexibility, the applicable operators should be tied to the business type.
   
   
   Figure 3: Current list of operators (left) vs customized list of operators 
(right)
   
   Before           |  After
   :-------------------------:|:-------------------------:
   
![all_ops](https://user-images.githubusercontent.com/71385290/147107285-aabed8ad-bdca-4495-8b00-781c466f12af.png)
 |  
![unnamed](https://user-images.githubusercontent.com/71385290/147107253-713f6007-2624-4cf8-a22d-90f52debda89.png)
   
   
   
   
   **Allow input validation and value conversion on business types**
   
   Currently any input string can be entered into an ad-hoc filter. We propose 
exposing a server REST endpoint to validate input strings and also return the 
"normalized" representation. The normalized representation should be accessible 
to the user so they can verify the conversion. This could be displayed via a 
tooltip in order to minimize UI changes to the control.
   
   Example inputs
   - Time range: "Last day" -> "2021-03-16T21:57:33.000Z - 
2021-03-17T21:57:33.000Z"
   - Numeric IP: "192.168.0.1" -> 3232235521
   - Numeric IP: 3232235521 -> 3232235521
   - Port: "HTTPS" -> "443"
   - Port: "8080" -> "8080"
   - Numeric CIDR: "1.1.1.1/16" -> {16842752 - 16908287}
   
   An input validator for a PORT could validate that the string is a parsable 
number between 0 and 65536 or that it contains known port names such as HTTPS, 
SSH, FTP.  An input validator for a DOMAIN could validate that the string is 
parsable as a domain and contains only alphanumeric characters, periods (.), 
and dashes (-).
   
   It should be possible to add additional input validators via a configuration 
like config.py.
   
   Figure 3: Error message for a filter value that fails validation for a field 
of type Port
   
   
![unnamed](https://user-images.githubusercontent.com/71385290/147107446-80a4d09f-fed2-4f5e-841b-48da72fa44ca.png)
   
   **Allow custom behavior for each operator on a business type**
   
   Ad-hoc filters currently work as would be expected on the backend: a filter 
is translated directly into an SQL statement that matches the filter, so the 
filter `{col: "ip_col", op: "IN", val: "1.1.1.1/16"}` is translated to the 
statement `ip_col IN '1.1.1.1/16'`. Based on the idea of business types, we 
want the value `'1.1.1.1/16'` to first be converted to a decimal range 
(16842752 - 16908287) as covered above, but just doing the conversion is not 
enough for this range type as we would get something like this in the best 
case: `ip_col IN [16842752, 16842753, ..., 16908287]`. We would more likely 
want this:  `ip_col >= 16842752 AND ip_col <= 16908287`. In order to achieve 
this, we propose making it possible to specify custom behavior for each defined 
operator associated with a business type. This would allow a lot of flexibility 
for each type.
   
   **Allow for custom display values**
   
   Columns with business types should (optionally) be displayed in their human 
readable format (i.e dot decimal notation for IPs even though they are stored 
as integers). In order to accomplish this, we should generate a list of 
user-friendly values based on the target column and append it to the result set 
or replace the original column with these new values. This could be 
accomplished in the `filter` in `superset/views/core.py`. We should optimally 
clean up some technical debt in the backend code by porting it to the v1 API, 
and while doing so we can introduce the additional functionality needed for 
this feature (two birds with one stone).
   
   
   ### New or Changed Public Interfaces
   
   **Changes to the config**
   
   In order to implement business logic for business types we propose adding a 
collection of BusinessType objects. These will define the available business 
types and their functionalities. The proposed class will be as follows are as 
follows:  
   
   description: 
   - a description of the usage of the business type
   
   verbose_name:
   - name used for displaying purposes, e.g. “IP and CIDR (v4)”
   
   Applicatable_types
   - a list of `GenericDataType` types that this business type applies to. 
Using the earlier example of IPs stored as ints, that business type would only 
function on integer value columns 
   
   type_translation:
   - a function that accepts unparsed values and return: a status indicating 
whether the provided value was valid, the parsed value(s) based on business 
type, valid operators, and formatted value(s)
   
   filter_translation:
   - a function that accepts a SqlaColumn, FilterOperator, and an unparsed 
value (or values). The function calls the business_type_translations function 
(for appropriate business type), and returns a valid [SqlaExpression].
   - The purpose of these functions is to take in a valid filter operator and a 
column and return a valid sqla expression that can be used to form the query. 
   
   **Model changes**
   
   BaseColumn model
   - Add business type field (Nullable VARCHAR)
   
   Change SqlaTable's get_sql_query function inside the filter loop so that if 
a business type is present on a column, the operator logic is overridden by a 
business type’s operator logic (via callback defined in 
business_type_filter_translations). This callback’s response will then be 
appended to the where_and list to produce the query.
   
   **Addition of the v1/business_types/ endpoints**
   
   In order to use any of this logic in the front end we will need to define a 
set of API endpoints for business types. Under this section we would define the 
following endpoints:
   
   [GET] v1/business_type/convert
   - Calls the appropriate business_type_translation function as defined in the 
config for the given business type.
   - Parameters:
    - business type
    - a set of value(s) to be validated/converted 
   - Returns:
    - The output of the function retrieved from business_type_translation
   
   [GET] v1/business_type/types
   - Get the list of implemented business types and their descriptions. This 
list would be defined by the intersection of the set of keys for the 
business_type_translations and business_type_filter_translations dictionaries. 
This would be used in order to populate lists to select business types for a 
given column. 
   - Parameters:
    - None
   - Returns:
    - A list of implemented business types and their descriptions 
   
   **UI changes**
   
   The AdhocFilterEditPopoverSimpleTabContent will need to be modified. If a 
business type is present on a selected column, we use the 
`v1/business_type/convert` endpoint to retrieve: an input’s valid/invalid 
status, the list of operators to display, and the currently selected 
comparator’s converted value. The `valid_operators` list will need to override 
the current operators list defined by the `isOperatorRevelant` function. A 
tooltip component will also need to be added to the component to display the 
returned converted value. No behavioral changes should be present if a column 
without a business type is selected.
   
   
   ## Migration Plan and Compatibility
   
   A database migration is required.  This will only effect the BaseColumn 
model. We will need to add a varchar field to the columns table.  
   
   in order to maintain compatibility we will insure the business type column 
in nullable as well as hiding changes behind the ENABLE_BUSINESS_TYPES feature 
flag. 
   
   ## Rejected Alternatives
   
   no alternatives have been strongly considered at this time 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to