cccs-RyanS opened a new issue #17852:
URL: https://github.com/apache/superset/issues/17852
# [SIP-79] Proposal for Business Types
## Motivation
Our users are leveraging Superset to explore and visualize network security
logs. These logs often contain IP, CIDR range, port, hostname, and/or protocol
fields, among others. While such fields might be stored as simple INTEGERS or
VARCHARS in the backend database, these fields also have a “business” type
which, if leveraged, can provide additional context and improve the user
experience.
**User-Friendly Display**
A field’s business type can influence how that field is interacted with by
the users. Consider, for example, an IP field. For performance reasons, we
store IP values as INTEGERs in our DW e.g. the IP “127.0.0.1” would be stored
as its integer value 2130706433. However, the string representation is
substantially more human-readable and, as such, we wish to display IP fields in
dot-decimal notation in visualizations and filters. This numeric format is
fairly standard (link)and having this display functionality would be valuable
in superset in case the underlying DB does not support this conversion
functionality out of the box. As another example, a Port field stored as an
integer (e.g., 443) could be displayed as a protocol name where such mapping is
known (e.g., HTTPS).
**Applicable Operators for filters**
Certain operators (e.g., `LIKE`, `=`, `<`, `IS NULL`, `IN`, etc.) may be
valid for a database type, but may not be applicable to a business type. For
example, an organization might decide that the `LIKE` operator is not relevant
to Port fields. Ideally, we want to be able to configure a list of applicable
operators that will appear for a field of a certain business type.
**Input Validation**
Business types can be associated with (configurable) input validation and
type conversion rules. For example, an input validator for a Port field could
validate that the filter string is a parsable number between 0 and 65536, or
that it contains known port names such as HTTPS, SSH, FTP.
**Human Readable input values**
As with displaying the human readable format of a value to the user (i.e the
dot notation of an IP) users should be able to enter the more readable form of
a value and have it be applied properly to a query. This would mean a user
could create a filter such as `{col: "ip_col", op: "IN", val: [“1.1.1.1”,
“1.1.1.2”]}` and have the query be executed as `ip_col IN [16842752,
16842753]`. The functionality for a user to simply enter `{col: "ip_col", op:
"IN", val: [16842752, 16842753]}` should also be preserved.
**Behavior specified in the config**
The code to specify the behavior of each business type should be defined
outside of the superset codebase (i.e the config). This way those who wish to
implement a business type may do so freely without having to make any code
changes to superset.
**Proof of concept**
Figure 1: a video displaying a proof of concept for this proposal. a version
of the modified filters popover is shown. The user inputs both the human
readable and numeric values for ports and the conversion is shown. Also it is
shown the user cannot save an invalid value, `abc` in this case. The user then
saves and submits a valid filter and runs the query. the query is then
displayed.
https://user-images.githubusercontent.com/71385290/147113657-f8d72243-2852-43c9-8f64-604aff734fda.mp4
## Proposed Changes
We propose that the concept of a “business type” be introduced to dataset
columns in Superset and that this business type be leveraged to allow display
formats, applicable operators, and validation rules to be added to fields of a
particular business type. Business types would be a purely optional attribute
which could be specified on columns; if a business type is not specified for a
particular column, there would be no changes to how a column is displayed, what
operators are available, or what validation is performed on filter inputs.
**Adding the business type to dataset**
Introduce the concept of a business type on dataset columns
- Add a business_type attribute to the BaseColumn model. This column only
needs to be a nullable VARCHAR.
- Add a “Business Type” field to the “Edit Datasource” modal (Figure 1)
where dataset creators/editors can set the business type for a column. This
should be restricted to applicable business types
Figure 2: Edit Datasource modal with Business Type field

**Allow applicable operator sets to be defined for each business type**
The Explorer UI has an ad-hoc filter which lets the user enter a condition
on a column, a value and an operator (>, =, LIKE, IS NULL, etc).
Currently all operators are available to the user. We propose returning a
list of applicable operators in the response payload. The list of applicable
operators should be configurable for each custom business type. For maximum
flexibility, the applicable operators should be tied to the business type.
Figure 3: Current list of operators (left) vs customized list of operators
(right)
Before | After
:-------------------------:|:-------------------------:

|

**Allow input validation and value conversion on business types**
Currently any input string can be entered into an ad-hoc filter. We propose
exposing a server REST endpoint to validate input strings and also return the
"normalized" representation. The normalized representation should be accessible
to the user so they can verify the conversion. This could be displayed via a
tooltip in order to minimize UI changes to the control.
Example inputs
- Time range: "Last day" -> "2021-03-16T21:57:33.000Z -
2021-03-17T21:57:33.000Z"
- Numeric IP: "192.168.0.1" -> 3232235521
- Numeric IP: 3232235521 -> 3232235521
- Port: "HTTPS" -> "443"
- Port: "8080" -> "8080"
- Numeric CIDR: "1.1.1.1/16" -> {16842752 - 16908287}
An input validator for a PORT could validate that the string is a parsable
number between 0 and 65536 or that it contains known port names such as HTTPS,
SSH, FTP. An input validator for a DOMAIN could validate that the string is
parsable as a domain and contains only alphanumeric characters, periods (.),
and dashes (-).
It should be possible to add additional input validators via a configuration
like config.py.
Figure 3: Error message for a filter value that fails validation for a field
of type Port

**Allow custom behavior for each operator on a business type**
Ad-hoc filters currently work as would be expected on the backend: a filter
is translated directly into an SQL statement that matches the filter, so the
filter `{col: "ip_col", op: "IN", val: "1.1.1.1/16"}` is translated to the
statement `ip_col IN '1.1.1.1/16'`. Based on the idea of business types, we
want the value `'1.1.1.1/16'` to first be converted to a decimal range
(16842752 - 16908287) as covered above, but just doing the conversion is not
enough for this range type as we would get something like this in the best
case: `ip_col IN [16842752, 16842753, ..., 16908287]`. We would more likely
want this: `ip_col >= 16842752 AND ip_col <= 16908287`. In order to achieve
this, we propose making it possible to specify custom behavior for each defined
operator associated with a business type. This would allow a lot of flexibility
for each type.
**Allow for custom display values**
Columns with business types should (optionally) be displayed in their human
readable format (i.e dot decimal notation for IPs even though they are stored
as integers). In order to accomplish this, we should generate a list of
user-friendly values based on the target column and append it to the result set
or replace the original column with these new values. This could be
accomplished in the `filter` in `superset/views/core.py`. We should optimally
clean up some technical debt in the backend code by porting it to the v1 API,
and while doing so we can introduce the additional functionality needed for
this feature (two birds with one stone).
### New or Changed Public Interfaces
**Changes to the config**
In order to implement business logic for business types we propose adding a
collection of BusinessType objects. These will define the available business
types and their functionalities. The proposed class will be as follows are as
follows:
description:
- a description of the usage of the business type
verbose_name:
- name used for displaying purposes, e.g. “IP and CIDR (v4)”
Applicatable_types
- a list of `GenericDataType` types that this business type applies to.
Using the earlier example of IPs stored as ints, that business type would only
function on integer value columns
type_translation:
- a function that accepts unparsed values and return: a status indicating
whether the provided value was valid, the parsed value(s) based on business
type, valid operators, and formatted value(s)
filter_translation:
- a function that accepts a SqlaColumn, FilterOperator, and an unparsed
value (or values). The function calls the business_type_translations function
(for appropriate business type), and returns a valid [SqlaExpression].
- The purpose of these functions is to take in a valid filter operator and a
column and return a valid sqla expression that can be used to form the query.
**Model changes**
BaseColumn model
- Add business type field (Nullable VARCHAR)
Change SqlaTable's get_sql_query function inside the filter loop so that if
a business type is present on a column, the operator logic is overridden by a
business type’s operator logic (via callback defined in
business_type_filter_translations). This callback’s response will then be
appended to the where_and list to produce the query.
**Addition of the v1/business_types/ endpoints**
In order to use any of this logic in the front end we will need to define a
set of API endpoints for business types. Under this section we would define the
following endpoints:
[GET] v1/business_type/convert
- Calls the appropriate business_type_translation function as defined in the
config for the given business type.
- Parameters:
- business type
- a set of value(s) to be validated/converted
- Returns:
- The output of the function retrieved from business_type_translation
[GET] v1/business_type/types
- Get the list of implemented business types and their descriptions. This
list would be defined by the intersection of the set of keys for the
business_type_translations and business_type_filter_translations dictionaries.
This would be used in order to populate lists to select business types for a
given column.
- Parameters:
- None
- Returns:
- A list of implemented business types and their descriptions
**UI changes**
The AdhocFilterEditPopoverSimpleTabContent will need to be modified. If a
business type is present on a selected column, we use the
`v1/business_type/convert` endpoint to retrieve: an input’s valid/invalid
status, the list of operators to display, and the currently selected
comparator’s converted value. The `valid_operators` list will need to override
the current operators list defined by the `isOperatorRevelant` function. A
tooltip component will also need to be added to the component to display the
returned converted value. No behavioral changes should be present if a column
without a business type is selected.
## Migration Plan and Compatibility
A database migration is required. This will only effect the BaseColumn
model. We will need to add a varchar field to the columns table.
in order to maintain compatibility we will insure the business type column
in nullable as well as hiding changes behind the ENABLE_BUSINESS_TYPES feature
flag.
## Rejected Alternatives
no alternatives have been strongly considered at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]