cccs-jc commented on issue #13551: URL: https://github.com/apache/superset/issues/13551#issuecomment-797725837
### Use Case We use Superset as a Security information and event management (SIEM). Network security logs are frequently used in our analysis. Users search security logs using IP values (e.g.: 192.168.0.7) or for entire networks using CIDR notation (e.g.: 192.168.0.0/24). The CIDR is a well know networking concept expressing a range of IPs 192.168.0.0 to 192.168.0.255 in this case. **Users search for an IP on a particular column.** https://user-images.githubusercontent.com/56140112/110991620-dd5ef480-8342-11eb-88bc-cca8aaf01047.mp4 **Users search for a range of IPs on a particular column.** https://user-images.githubusercontent.com/56140112/110991763-0bdccf80-8343-11eb-9526-8d968554afa1.mp4 **Users search for IP or range of IPs on either the source or destination columns.** https://user-images.githubusercontent.com/56140112/110991799-14350a80-8343-11eb-89e2-913fab7b8fe8.mp4 **For performance reasons our IP values are physically stored as INTEGER in RDBMS. This is what a typical network IP flow table might look like**  However, users think of IPs in terms dot notation or CIDR notation, not in terms of numbers. We thus provide a conversion mechanism to hide this implementation detail from our users. Results are thus presented to the user in tables like this one.  ### How we achieve this with Superset #### Rendering In superset we’ve modelled the flow table by typing the IP_SRC and IP_DST columns as IPV4. We use this typing information when rendering the values on the client side (in our custom visualization). Our rendering also generates a hyperlink to a page giving further information about the given IP. #### Query Mutator We leverage superset's SQL_QUERY_MUTATOR to convert the IP strings provided by the user into the corresponding number value for the RDBMS. A query criterion of **SRC_IP = ‘2.2.2.2’** is mutated into **SRC_IP = 33686018** To query both SRC_IP or DST_IP we have introduced a “filter column” named IP. This column does not exist in the RDBMS. It’s only used as a filter (like Looker’s bind_filters feature https://docs.looker.com/reference/view-params/explore_source). A query criterion of **IP IN '2.2.2.2'** is mutated to **( ("SRC_IP" = 33686018) OR ("DST_IP" = 33686018) )** #### Superset Model We use the column data type to render IPV4 columns in dot notation.  We mark the IP (filter column) as not a dimension. Because the IP column is not a dimension it is not available option in the “COLUMNS” box of the Chart explorer. This is good because it’s not actually a column it’s just a filter.  However, the IP filter column is available in the ad-hoc filter. Which is great.  The SQL_QUERY_MUTATOR not only converts IP strings to numbers but can also handle more complex scenarios like querying on either the SRC_IP or DST_IP columns. For example, given the configuration above, superset would produce this query **FROM FLOW WHERE IP IN '2.2.2.2'** Which our SQL_QUERY_MUTATOR transforms into **FROM FLOW WHERE ( ("SRC_IP" = 33686018) OR ("DST_IP" = 33686018) )** Before it is sent to the RDBMS. On the client side we have a custom visualization which renders columns of type IPV4 in dot notation and generates hyperlinks.  #### IP Filter integration in a Dashboard The IP filter column works the same inside Dashboards. Since IP is a column it can be used in a filter box or in the new native filter support. The IP filter column can also handle CIDR notation (IP range query).  **FROM FLOW WHERE IP IN '2.2.2.0/24'** Is mutated into a range query on either SRC_IP or DST_IP columns **FROM FLOW WHERE ( ("SRC_IP" >= 33686016 AND "SRC_IP" <= 33686271) OR ("DST_IP" >= 33686016 AND "DST_IP" <= 33686271) )** ### Gaps and improvements Our current implementation is working well but the user experience needs some improvement. #### Inconsistent SQL operators The IP filter column is really a filtering function. It accepts one or more IPs and generates a query which applies a filter on either SRC_IP or DST_IP columns. Only the “IN” and “=” operator really make sense for the IP filter column. However, there is no means to control which operators to show the user.  The same argument can be made for port number. String operators “LIKE”, “IS NOT NULL”, “IS NULL” are not applicable for the port number column.  **It would be desirable for the ad-hoc filter UI and for the dashboard filters to only present to the user the applicable operators. The dataset model could be extended to support a list of applicable operators per column.** #### No input validation We would like to be able to validate the values entered for the IP filter. It should be of the form 0.0.0.0 or 0.0.0.0/32.  There are validators in superset to validate numerical and empty values. It would be great if we could create new ones for IPv4, IPv6 etc. Examples in other business lines might be credit card number, social insurance number, validating wildcard search expressions etc. The same argument applies to the port number column. It should only be possible to enter a number value and in this specific case a value between 0 and 65536 (2^32). The dataset model could be extended to support a list of applicable input validators for a given column. **The ad-hoc filter UI and the dashboard filters could use this information and apply the appropriate validation based on the chosen column.** ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
