cccs-jc commented on issue #13551:
URL: https://github.com/apache/superset/issues/13551#issuecomment-797725837


   ### Use Case
   
   We use Superset as a Security information and event management (SIEM). 
Network security logs are frequently used in our analysis.
   Users search security logs using IP values (e.g.: 192.168.0.7) or for entire 
networks using CIDR notation (e.g.: 192.168.0.0/24). The CIDR is a well know 
networking concept expressing a range of IPs 192.168.0.0 to 192.168.0.255 in 
this case.
   
   **Users search for an IP on a particular column.**
   
   
https://user-images.githubusercontent.com/56140112/110991620-dd5ef480-8342-11eb-88bc-cca8aaf01047.mp4
   
   **Users search for a range of IPs on a particular column.**
   
   
   
https://user-images.githubusercontent.com/56140112/110991763-0bdccf80-8343-11eb-9526-8d968554afa1.mp4
   
   
   **Users search for IP or range of IPs on either the source or destination 
columns.**
   
   
   
https://user-images.githubusercontent.com/56140112/110991799-14350a80-8343-11eb-89e2-913fab7b8fe8.mp4
   
   
   **For performance reasons our IP values are physically stored as INTEGER in 
RDBMS. This is what a typical network IP flow table might look like**
   
   
![table](https://user-images.githubusercontent.com/56140112/110991881-2f077f00-8343-11eb-9d58-f9258f2c31ac.png)
   
   
   
   However, users think of IPs in terms dot notation or CIDR notation, not in 
terms of numbers. We thus provide a conversion mechanism to hide this 
implementation detail from our users. Results are thus presented to the user in 
tables like this one.
   
   
   
![result-table](https://user-images.githubusercontent.com/56140112/110991943-40508b80-8343-11eb-8461-36a7e71c4fde.png)
   
   
   
   
   
   ### How we achieve this with Superset
   
   #### Rendering
   
   In superset we’ve modelled the flow table by typing the IP_SRC and IP_DST 
columns as IPV4. We use this typing information when rendering the values on 
the client side (in our custom visualization). Our rendering also generates a 
hyperlink to a page giving further information about the given IP.
   
   #### Query Mutator
   We leverage superset's SQL_QUERY_MUTATOR to convert the IP strings provided 
by the user into the corresponding number value for the RDBMS. 
   A query criterion of **SRC_IP = ‘2.2.2.2’** is mutated into **SRC_IP = 
33686018** 
   To query both SRC_IP or DST_IP we have introduced a “filter column” named 
IP. This column does not exist in the RDBMS. It’s only used as a filter (like 
Looker’s bind_filters feature 
https://docs.looker.com/reference/view-params/explore_source).
   A query criterion of **IP IN '2.2.2.2'** is mutated to **(  ("SRC_IP" = 
33686018) OR  ("DST_IP" = 33686018) )**
   
   
   #### Superset Model
   We use the column data type to render IPV4 columns in dot notation.
   
   
![result-table](https://user-images.githubusercontent.com/56140112/110992013-59593c80-8343-11eb-80dc-cc3df2ba7c96.png)
   
   We mark the IP (filter column) as not a dimension. Because the IP column is 
not a dimension it is not available option in the “COLUMNS” box of the Chart 
explorer. This is good because it’s not actually a column it’s just a filter.
   
![pick-columns-to-display](https://user-images.githubusercontent.com/56140112/110992059-69711c00-8343-11eb-9b10-9a44e5e5b4f8.png)
   
   However, the IP filter column is available in the ad-hoc filter. Which is 
great.
   
   
![use-ipfilter-in-adhoc-filter](https://user-images.githubusercontent.com/56140112/110992099-755cde00-8343-11eb-9161-ac52aeb18e73.png)
   
   
   The SQL_QUERY_MUTATOR not only converts IP strings to numbers but can also 
handle more complex scenarios like querying on either the SRC_IP or DST_IP 
columns.
   For example, given the configuration above, superset would produce this query
   **FROM FLOW WHERE IP IN '2.2.2.2'**
   Which our SQL_QUERY_MUTATOR transforms into
   **FROM FLOW WHERE (  ("SRC_IP" = 33686018) OR  ("DST_IP" = 33686018) )**
   Before it is sent to the RDBMS.
   
   On the client side we have a custom visualization which renders columns of 
type IPV4 in dot notation and generates hyperlinks.
   
   
![results-rendered-as-dot-notation](https://user-images.githubusercontent.com/56140112/110992137-84dc2700-8343-11eb-9a64-ac4ee4383c29.png)
   
   
   #### IP Filter integration in a Dashboard
   The IP filter column works the same inside Dashboards. Since IP is a column 
it can be used in a filter box or in the new native filter support. The IP 
filter column can also handle CIDR notation (IP range query).
   
   
   
![ip-filter-works-in-dashboard](https://user-images.githubusercontent.com/56140112/110992224-a0dfc880-8343-11eb-945a-a4bacd92c4ad.png)
   
   
   **FROM FLOW
   WHERE IP IN '2.2.2.0/24'**
   Is mutated into a range query on either SRC_IP or DST_IP columns
   **FROM FLOW
   WHERE
     (  ("SRC_IP" >= 33686016 AND "SRC_IP" <= 33686271) OR
        ("DST_IP" >= 33686016 AND "DST_IP" <= 33686271)  )**
   
   
   
   ### Gaps and improvements
   Our current implementation is working well but the user experience needs 
some improvement.
   
   #### Inconsistent SQL operators
   
   The IP filter column is really a filtering function. It accepts one or more 
IPs and generates a query which applies a filter on either SRC_IP or DST_IP 
columns. Only the “IN” and “=” operator really make sense for the IP filter 
column. However, there is no means to control which operators to show the user.
   
   
   
![only-valid-operators](https://user-images.githubusercontent.com/56140112/110992295-b9e87980-8343-11eb-9411-41a9cbabb0b1.png)
   
   
   The same argument can be made for port number. String operators “LIKE”, “IS 
NOT NULL”, “IS NULL” are not applicable for the port number column.
   
   
   
![port-number-column](https://user-images.githubusercontent.com/56140112/110992348-ca98ef80-8343-11eb-9cfd-6187cc446044.png)
   
   
   **It would be desirable for the ad-hoc filter UI and for the dashboard 
filters to only present to the user the applicable operators. The dataset model 
could be extended to support a list of applicable operators per column.** 
   
   
   #### No input validation
   
   We would like to be able to validate the values entered for the IP filter. 
It should be of the form 0.0.0.0 or 0.0.0.0/32.
   
   
   
   
   
   
   
![invalid-input](https://user-images.githubusercontent.com/56140112/110992394-dd132900-8343-11eb-8efe-044eb7948a8a.png)
   
   
   
   There are validators in superset to validate numerical and empty values. It 
would be great if we could create new ones for IPv4, IPv6 etc. Examples in 
other business lines might be credit card number, social insurance number, 
validating wildcard search expressions etc. 
   The same argument applies to the port number column. It should only be 
possible to enter a number value and in this specific case a value between 0 
and 65536 (2^32).
   The dataset model could be extended to support a list of applicable input 
validators for a given column.
   
   
   
   **The ad-hoc filter UI and the dashboard filters could use this information 
and apply the appropriate validation based on the chosen column.**
   
   
   
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to