phsm opened a new pull request, #10594:
URL: https://github.com/apache/cloudstack/pull/10594

   ### Description
   
   This PR changes the behavior of Security Groups to disable connection 
tracking when it is not needed.
   The idea is that the VM that have "allow all" rule can have as many 
connections as they want without straining the host system. This change may be 
benefitial for VPS hosters, where the VM behavior is not under control of the 
servers administrator.
   <!--- Describe your changes in DETAIL - And how has behaviour functionally 
changed. -->
   
   The list of changes:
   Introduced two new ipsets: `cs_notrack` for IPv4 and `cs_notrack6` for IPv6 
that contain the VM IP addresses that do not need to be tracked.
   When a security group contains a rule allowing all protocols from 0.0.0.0/0 
(IPv4) or ::/0 (IPv6), then all the IPv4 and/or IPv6 addresses of the VM are 
added to these ipsets.
   
   The following rules are added into iptables table `raw` chain `PREROUTING`:
   ```
   iptables -t raw -A PREROUTING -m set --match-set cs_notrack dst -j NOTRACK
   iptables -t raw -A PREROUTING -m set --match-set cs_notrack src -j NOTRACK
   ip6tables -t raw -A PREROUTING -m set --match-set cs_notrack6 dst -j NOTRACK
   ip6tables -t raw -A PREROUTING -m set --match-set cs_notrack6 src -j NOTRACK
   ```
   
   The iptables matchers `-m state --state NEW` are removed as they are not 
needed for several reasons:
   - they block the allowed traffic if the connection is not tracked
   - the rest of the matcher is explicit enough to allow the traffic that was 
specified in the security group
   - the conntrack look up calls can be very expensive on high packet per 
second rate when the connection tracking table has tens millions of records
   
   The `-m state --state ESTABLISHED,RELATED` rules are only placed at the end 
of the VM `-def` chain, as the last resort rule before the final decision to 
drop the packet. The goal is to try explicit matchers as much as possible.
   
   The behavior of the `-VM` chain that contains user-defined rules was 
modified:
   - do not return traffic in the rules, the only possible rule action is 
ACCEPT. If a packet doesn't match any rules, then it returns back to the `-def` 
chain, where it is checked to belong to an existing connection, otherwise 
dropped.
   - the above mentioned `-m state NEW` are removed.
   
   Since the VM `-def` chain is populated with rules for each NIC, and there is 
no place in inject the final unconditional `-j DROP` in the code, I had to 
resort to blocking traffic matching each VM network interface in the end of 
each set of interface-specific rules
   
   A minor refactoring is done:
   - The function `split_ips_by_family()` now takes one or more arguments that 
can be either a `;`-separated string or any other type that can be parsed by 
Python `ipaddress.ip_address()` method.
   The function splits `;`-separated strings when it encounters them, removes 
the empty elements and '0' literals (they indicate an empty IP address list for 
some reason).
   As the result, it returns a tuple containing a list of IPv4 addresses, and 
IPv6 addresses. Therefore, the function is backwards compatible to the previous 
behavior.
   - Some lines of code that were doing the same functionality as the updated 
`split_ips_by_family()`, are removed.
   - The function `add_to_ipset()` uses `-!` flag that silently ignores 
addition of a new element if it already exists in the ipset, or its removal if 
it doesn't exist in the ipset. It will still crash if the requested ipset does 
not exist. This change makes `ipset add` calls indempotent.
   
   
   
   <!-- For new features, provide link to FS, dev ML discussion etc. -->
   <!-- In case of bug fix, the expected and actual behaviours, steps to 
reproduce. -->
   
   <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be 
closed when this PR gets merged -->
   <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
   <!-- Fixes: # -->
   
   <!--- 
******************************************************************************* 
-->
   <!--- NOTE: AUTOMATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE 
DOCUMENTATION. -->
   <!--- PLEASE PUT AN 'X' in only **ONE** box -->
   <!--- 
******************************************************************************* 
-->
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Bug fix (non-breaking change which fixes an issue)
   - [X] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   - [ ] build/CI
   - [ ] test (unit or integration test code)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [ ] Major
   - [X] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [ ] Major
   - [ ] Minor
   - [ ] Trivial
   
   ### Screenshots (if appropriate):
   
   ### How Has This Been Tested?
   Tested:
   - starting a VM
   - changing the rules on the fly: add/remove "allow all" rule, add more 
specific rules such as allow a specific TCP port range.
   - migrating the VM to a host with these changes deployed
   - migrating the VM from the host with these changes deployed to a host with 
"vanilla" security groups script, and back
   - both ingress and egress security groups behavior is tested.
   
   <!-- Please describe in detail how you tested your changes. -->
   <!-- Include details of your testing environment, and the tests you ran to 
-->
   
   #### How did you try to break this feature and the system with this change?
   - Test with only egress traffic allowed (no ingress rules)
   - Test with only ingress traffic allowed (only one egress rule allowing 
traffic to a non-existing IP address, that makes every other egress traffic 
dropped)
   - Test with more-specific rules, e.g. allow specific ports, or allow only 
IPv6 traffic
   
   <!-- see how your change affects other areas of the code, etc. -->
   
   <!-- Please read the 
[CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) 
document -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to