phsm opened a new pull request, #10594: URL: https://github.com/apache/cloudstack/pull/10594
### Description This PR changes the behavior of Security Groups to disable connection tracking when it is not needed. The idea is that the VM that have "allow all" rule can have as many connections as they want without straining the host system. This change may be benefitial for VPS hosters, where the VM behavior is not under control of the servers administrator. <!--- Describe your changes in DETAIL - And how has behaviour functionally changed. --> The list of changes: Introduced two new ipsets: `cs_notrack` for IPv4 and `cs_notrack6` for IPv6 that contain the VM IP addresses that do not need to be tracked. When a security group contains a rule allowing all protocols from 0.0.0.0/0 (IPv4) or ::/0 (IPv6), then all the IPv4 and/or IPv6 addresses of the VM are added to these ipsets. The following rules are added into iptables table `raw` chain `PREROUTING`: ``` iptables -t raw -A PREROUTING -m set --match-set cs_notrack dst -j NOTRACK iptables -t raw -A PREROUTING -m set --match-set cs_notrack src -j NOTRACK ip6tables -t raw -A PREROUTING -m set --match-set cs_notrack6 dst -j NOTRACK ip6tables -t raw -A PREROUTING -m set --match-set cs_notrack6 src -j NOTRACK ``` The iptables matchers `-m state --state NEW` are removed as they are not needed for several reasons: - they block the allowed traffic if the connection is not tracked - the rest of the matcher is explicit enough to allow the traffic that was specified in the security group - the conntrack look up calls can be very expensive on high packet per second rate when the connection tracking table has tens millions of records The `-m state --state ESTABLISHED,RELATED` rules are only placed at the end of the VM `-def` chain, as the last resort rule before the final decision to drop the packet. The goal is to try explicit matchers as much as possible. The behavior of the `-VM` chain that contains user-defined rules was modified: - do not return traffic in the rules, the only possible rule action is ACCEPT. If a packet doesn't match any rules, then it returns back to the `-def` chain, where it is checked to belong to an existing connection, otherwise dropped. - the above mentioned `-m state NEW` are removed. Since the VM `-def` chain is populated with rules for each NIC, and there is no place in inject the final unconditional `-j DROP` in the code, I had to resort to blocking traffic matching each VM network interface in the end of each set of interface-specific rules A minor refactoring is done: - The function `split_ips_by_family()` now takes one or more arguments that can be either a `;`-separated string or any other type that can be parsed by Python `ipaddress.ip_address()` method. The function splits `;`-separated strings when it encounters them, removes the empty elements and '0' literals (they indicate an empty IP address list for some reason). As the result, it returns a tuple containing a list of IPv4 addresses, and IPv6 addresses. Therefore, the function is backwards compatible to the previous behavior. - Some lines of code that were doing the same functionality as the updated `split_ips_by_family()`, are removed. - The function `add_to_ipset()` uses `-!` flag that silently ignores addition of a new element if it already exists in the ipset, or its removal if it doesn't exist in the ipset. It will still crash if the requested ipset does not exist. This change makes `ipset add` calls indempotent. <!-- For new features, provide link to FS, dev ML discussion etc. --> <!-- In case of bug fix, the expected and actual behaviours, steps to reproduce. --> <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be closed when this PR gets merged --> <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" --> <!-- Fixes: # --> <!--- ******************************************************************************* --> <!--- NOTE: AUTOMATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE DOCUMENTATION. --> <!--- PLEASE PUT AN 'X' in only **ONE** box --> <!--- ******************************************************************************* --> ### Types of changes - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] New feature (non-breaking change which adds functionality) - [ ] Bug fix (non-breaking change which fixes an issue) - [X] Enhancement (improves an existing feature and functionality) - [ ] Cleanup (Code refactoring and cleanup, that may add test cases) - [ ] build/CI - [ ] test (unit or integration test code) ### Feature/Enhancement Scale or Bug Severity #### Feature/Enhancement Scale - [ ] Major - [X] Minor #### Bug Severity - [ ] BLOCKER - [ ] Critical - [ ] Major - [ ] Minor - [ ] Trivial ### Screenshots (if appropriate): ### How Has This Been Tested? Tested: - starting a VM - changing the rules on the fly: add/remove "allow all" rule, add more specific rules such as allow a specific TCP port range. - migrating the VM to a host with these changes deployed - migrating the VM from the host with these changes deployed to a host with "vanilla" security groups script, and back - both ingress and egress security groups behavior is tested. <!-- Please describe in detail how you tested your changes. --> <!-- Include details of your testing environment, and the tests you ran to --> #### How did you try to break this feature and the system with this change? - Test with only egress traffic allowed (no ingress rules) - Test with only ingress traffic allowed (only one egress rule allowing traffic to a non-existing IP address, that makes every other egress traffic dropped) - Test with more-specific rules, e.g. allow specific ports, or allow only IPv6 traffic <!-- see how your change affects other areas of the code, etc. --> <!-- Please read the [CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) document --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
