JinwooHwang commented on PR #7941:
URL: https://github.com/apache/geode/pull/7941#issuecomment-3617666432

   Hi @sboorlagadda,
   
   Thank you so much for your thorough review and for raising these thoughtful 
architectural concerns. I appreciate the time and care you’ve taken to consider 
the design in depth. I would like to address each of your points with 
supporting technical evidence and references to industry best practices.
   
   ## Why These Must Be Separate Systems
   
   The architectural concern lists four issues:
   - Configuration complexity (two separate whitelist systems)
   - Operational burden (maintaining dual security policies)
   - User confusion (different config for sessions vs regions)
   - Potential security gaps (inconsistent policies)
   
   These aren't problems - they're the expected and necessary characteristics 
of layered security architecture. Each concern reflects proper separation of 
security boundaries, not architectural weakness.
   
   This separation is intentional. The two layers serve different purposes and 
protect distinct attack surfaces with different threat models.
   
   **Geode's infrastructure filter protects:**
   - Cluster communication (peer-to-peer messages)
   - Internal region operations
   - Distributed system metadata
   - Cache consistency protocols
   - Threats: Malicious cluster members, corrupted internal data, version 
incompatibilities
   
   **Session filter must protect:**
   - HTTP request boundary (untrusted user input)
   - Application-level objects from web clients
   - Session hijacking attempts
   - Deserialization gadget chains in user data
   - Threats: External attackers via HTTP, compromised client applications, RCE 
exploits
   
   These are fundamentally different security boundaries. 
   
   **Real-world analogy:**
   Your firewall filters network packets. Your application still validates HTTP 
input. Both are active simultaneously, both filter "data entering the system," 
but at different trust boundaries with different policies. You wouldn't merge 
them just because they both filter.
   
   Similarly:
   - Network layer: Firewall blocks malicious IPs, port scans, DDoS
   - Application layer: WAF blocks SQL injection, XSS, malformed requests
   
   Both active. Different purposes. Different policies.
   
   ## Industry Validation: Defense-in-Depth
   
   This approach isn’t something I invented; it’s a well-established industry 
practice:
   
   
   **Spring Framework (OWASP layered security):**
   
   - Web layer: OncePerRequestFilter, CSRF protection, HTTP request filters, 
secure headers
   Handles HTTP-level security (authentication, session management, CSRF, input 
validation)
   Operates independently from service and data layers.
   
   - Service layer: @PreAuthorize, @PostAuthorize, @Secured annotations, 
method-level security
   Enforces authorization and business rules at the service/business-logic 
level.
   Provides a separate control boundary independent of web filters.
   
   - Data layer: Safe data-access patterns (prepared statements, parameter 
binding in JDBC/JPA)
   Protects against SQL injection when used correctly.
   
   Each layer has independent security controls. Web-layer filters are not 
merged with service or database security, enabling defense-in-depth and 
reducing the impact of a single-layer compromise.
   
   **AWS multi-layer architecture:**
   - ALB/WAF: HTTP-level filtering
   - Lambda/EC2: IAM roles, network security groups
   - RDS/DynamoDB: Database-specific encryption, access controls
   
   **Apache Kafka (CWE-502 context-specific filters):**
   - Internal Protocol: Kafka's binary protocol, only Kafka internal classes
   - User Data: Configurable deserializers (StringDeserializer, 
ByteArrayDeserializer, or custom).
   - Separation: User data serialization should be isolated from internal 
messages.
   
   **Apache Cassandra:**
   - Internal cluster communication: Custom binary protocol, only Cassandra 
classes
   - User data (CQL queries): User-defined types (UDTs) and application objects 
can be queried and stored.
   
   **Hazelcast (direct competitor of our Apache Geode):**
   ```java
   Config config = new Config();
   
   // Internal cluster objects
   config.getSerializationConfig()
       .addDataSerializableFactory(100, new MyInternalFactory());
   
   // Application/user objects
   config.getSerializationConfig()
       .addPortableFactory(200, new MyApplicationFactory())
       .setAllowUnsafe(false);
   ```
   Hazelcast explicitly separates internal vs application serialization.
   
   **Microsoft SDL Trust Boundary Analysis (accurate, generalized):**
   - Trust Boundary 1: External (HTTP clients) → apply safe deserialization and 
input validation; use a whitelist of classes for deserialization.
   - Trust Boundary 2: Internal (cluster members) → validate serialized objects 
exchanged between nodes; apply appropriate deserialization safeguards.
   - Principle: Different trust boundaries have different threat models, 
requiring independent controls.
   
   **Real-world CVE fixes validate this:**
   - CVE-2015-7501 (JBoss): Fixed by safe deserialization and class 
whitelisting in HTTP/EJB entry points.
   - CVE-2016-0638 (WebLogic): Fixed by restricting allowed classes for T3 
protocol and HTTP deserialization.
   - CVE-2017-5645 (Log4j): Fixed by patching JMS deserialization logic and 
enforcing allowed class lists.
   Principle: These CVEs demonstrate the importance of context-aware, 
layer-specific deserialization safeguards, rather than generic “layered 
filters.”
   
   Industry consensus: Multi-layered, context-specific filters are the standard.
   
   ## Addressing the Specific Concerns
   
   **"Configuration complexity (two separate whitelist systems)"**
   
   This is defense-in-depth, not complexity. Each boundary needs its own policy:
   
   - Geode allowlist: 485 classes for internal operations (ResourceManager, 
StatisticDescriptor, DistributionMessage, etc.)
   - Session allowlist: 80 classes for user data (String, Integer, ArrayList, 
HashMap, etc.)
   
   **"Operational burden (maintaining dual security policies)"**
   
   The policies serve different purposes and must be maintained separately:
   
   Geode's 485-class allowlist includes:
   - org.apache.geode.internal.cache.tier.sockets.command.*
   - org.apache.geode.distributed.internal.*
   - org.apache.geode.management.internal.*
   - Classes needed for cluster operations
   
   Session's 80-class allowlist includes:
   - java.lang.String, Integer, Boolean
   - java.util.ArrayList, HashMap, HashSet
   - java.util.Date, Calendar
   - Classes safe for user-facing web applications
   
   These lists have zero overlap in purpose. 
   
   **"User confusion (different config for sessions vs regions)"**
   
   This is expected and correct. Different architectural layers have different 
configuration:
   
   **Infrastructure layer (managed by cluster admins):**
   - Geode cluster setup
   - Region replication policies
   - `validate-serializable-objects` configuration
   - Internal class allowlist (485 Geode classes)
   
   **Application layer (managed by application developers):**
   - Session timeout settings
   - Custom session class registration via 
`SafeDeserializationFilter.allowAdditionalClasses()`
   - Application-specific allowlist (80 JDK classes + custom classes)
   
   Having different configuration at different layers isn't confusion - it's 
separation of concerns. 
   
   **"Potential security gaps (inconsistent policies)"**
   
   The "gap" is actually the security boundary. Inconsistent policies are 
correct here - they enforce isolation:
   
   - HTTP boundary: Strict 80-class allowlist (only safe user data types)
   - Cluster boundary: 485-class allowlist (internal operations need more 
classes)
   
   The security gap exists when boundaries don't have appropriate policies. 
PR-7941 adds the missing policy at the HTTP boundary.
   
   ## Discussion Points Answered
   
   **1. Should we extend existing Geode serialization infrastructure?**
   
   I don't think we should, because they protect different things:
   
   - Geode infrastructure: Protects cluster integrity (peer communication, 
internal operations)
   - Session filter: Protects against external attackers via HTTP
   
   The fix must be at the HTTP boundary with an HTTP-appropriate policy.
   
   **2. How do we provide unified configuration for users?**
   
   I don't believe we do, because unification breaks isolation. The 
configurations serve different purposes and are typically managed by different 
roles:
   
   **3. What's the migration path for existing session deployments?**
   
   Zero migration for standard use cases:
   - PR-7941 includes 80 common JDK classes in the hardcoded allowlist
   - Covers String, Integer, Collections, Date, etc.
   
   Existing sessions stored in regions continue working. The filter only 
applies during deserialization, and legitimate classes pass through once added 
to the allowlist.
   
   **4. How do we handle the different threat models?**
   
   By maintaining appropriate policies at appropriate boundaries:
   
   **Web application threat model:**
   - Attacker: External, untrusted HTTP clients
   - Attack vector: Crafted session attributes with gadget chains
   - Required defense: Strict allowlist of safe user-facing classes (80 classes)
   - Boundary: HTTP session setAttribute/getAttribute
   
   **Distributed cache threat model:**
   - Attacker: Compromised cluster member, corrupted internal data
   - Attack vector: Malicious region data, corrupted metadata
   - Required defense: Allowlist of internal operation classes (485 classes)
   - Boundary: Cluster communication, region operations
   
   The correct approach: Different threats require different policies at 
different boundaries. This is standard security architecture.
   
   ## The Complete Architecture: Two Filters, Two Boundaries
   
   PR-7941 implements defense-in-depth with filters at both security boundaries:
   
   ```
   ┌──────────────────────────────────────────────────────────────────┐
   │  External Attacker (HTTP Client)                                 │
   └────────────────┬─────────────────────────────────────────────────┘
                    │
                    ▼
            ┌──────────────────────────────┐
            │  SafeDeserializationFilter   │  ← NEW: PR-7941
            │      (80 JDK classes)        │     HTTP Boundary Protection
            └──────────────┬───────────────┘
                           │ ✓ String, Integer, ArrayList allowed
                           │ ✗ ResourceManager, CacheConfig BLOCKED
                           ▼
            ┌──────────────────────────────┐
            │      Session Object          │
            │      (validated data)        │
            └──────────────┬───────────────┘
                           │
                           ▼
            ┌──────────────────────────────┐
            │       Geode Region           │
            │     (serialized bytes)       │
            └──────────────┬───────────────┘
                           │
            ┌──────────────▼───────────────┐
            │    Region Replication        │
            │    Internal Operations       │
            │   Cluster Communication      │
            └──────────────┬───────────────┘
                           │
                           ▼
            ┌──────────────────────────────┐
            │  validate-serializable-      │  ← EXISTING: Geode Infrastructure
            │         objects              │     Cluster Boundary Protection
            │     (485 Geode classes)      │
            └──────────────┬───────────────┘
                           │ ✓ BucketAdvisor, DistributionMessage allowed
                           │ ✗ Untrusted application classes BLOCKED
                           ▼
            ┌──────────────────────────────┐
            │      Cluster Member          │
            │        (peer node)           │
            └──────────────────────────────┘
   ```
   
   **Two independent security layers designed for different boundaries:**
   
   1. **SafeDeserializationFilter** (HTTP → Session) - **NEW, Always Active:**
      - Protects against external attackers sending malicious session attributes
      - Prevents gadget chain RCE attacks via HTTP (CVSS 9.8 vulnerability)
      - Blocks internal Geode classes (ResourceManager, CacheConfig) from HTTP 
input
      - Allows only safe user-facing types (String, Integer, Collections)
   
   2. **validate-serializable-objects** (Cluster Operations) - **Existing, 
Opt-in (disabled by default):**
      - When enabled, protects against corrupted internal data during 
replication
      - Prevents malicious cluster members from injecting bad classes
      - Blocks arbitrary application classes from internal operations
      - Allows only Geode infrastructure classes needed for cluster integrity
   
   **Why separate filters at separate boundaries:**
   
   - **Session data CAN flow through both filters** - but at different points 
in its lifecycle:
     - Filter 1 (SafeDeserializationFilter) applies when user calls 
`session.setAttribute()` (HTTP boundary) - **always active**
     - Filter 2 (validate-serializable-objects) applies when Geode replicates 
session region to other nodes (cluster boundary) - **only if enabled by admin**
     
   - **Different attack vectors require different defenses:**
     - HTTP attacker tries to inject malicious classes via session → Filter 1 
blocks (new protection)
     - Compromised cluster member tries to inject malicious classes via 
replication → Filter 2 blocks (if enabled)
   
   - **Critical insight: PR-7941 adds the FIRST layer of defense:**
     - **Before PR-7941:** Zero protection at HTTP boundary (CVSS 9.8 RCE 
vulnerability)
     - **After PR-7941:** HTTP boundary protected with strict 80-class allowlist
     - **Geode's cluster filter:** Exists but disabled by default - users can 
enable for additional defense-in-depth
   
   **This is the same pattern every distributed system uses:**
   
   - **Spring Security:** FilterChainProxy with multiple filters for different 
URL patterns and layers
   - **AWS:** Security groups (network) + IAM policies (identity) + S3 bucket 
policies (data)
   - **Kubernetes:** NetworkPolicies (network) + RBAC (API) + 
PodSecurityPolicies (runtime)
   - **Database systems:** Connection authentication + SQL authorization + 
row-level security
   The architectural principle: **Match security controls to trust boundaries, 
not to minimize filter count.**
   
   **Current state and impact of PR-7941:**
   
   - **Before PR-7941:** 
     - HTTP boundary (most exposed): No protection 
     - Cluster boundary: Optional protection (disabled by default)
     
   - **After PR-7941:**
     - HTTP boundary: **Always protected** with SafeDeserializationFilter
     - Cluster boundary: Optional protection (unchanged - users can enable if 
needed)
   
   This PR closes the critical vulnerability by adding mandatory protection at 
the HTTP boundary where the  attack occurs. The cluster-level filter remains 
available as an optional additional layer for users who want defense-in-depth 
against internal threats.
   Before this PR, Geode had zero protection at the HTTP boundary - the most 
exposed attack surface. This fix adds the missing layer where the vulnerability 
exists.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to