[ 
https://issues.apache.org/jira/browse/UNOMI-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18008990#comment-18008990
 ] 

Serge Huber commented on UNOMI-897:
-----------------------------------

This fix has been committed, reviewed and merged in 
https://github.com/apache/unomi/pull/720
We now need to up-port it to Unomi 3 which has a different implementation of 
the service due to multi-tenancy.

> Two separate issues in Groovy Actions implementation - Data contamination 
> race condition and severe performance problems
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: UNOMI-897
>                 URL: https://issues.apache.org/jira/browse/UNOMI-897
>             Project: Apache Unomi
>          Issue Type: Bug
>          Components: unomi(-core)
>    Affects Versions: unomi-2.4.0, unomi-2.5.0, unomi-2.6.0, unomi-2.6.1
>         Environment: - Apache Unomi 2.0.0+
> - Groovy actions extension enabled
> - Multi-threaded environments
> - Production deployments with concurrent users
> - High-concurrency scenarios (1000+ concurrent threads)
>            Reporter: Serge Huber
>            Priority: Major
>             Fix For: unomi-3.0.0, unomi-2.7.0
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This ticket documents *two separate but related issues* in the 
> {code}GroovyActionDispatcher{code} that affect Unomi deployments using Groovy 
> actions:
> # *ISSUE #1: Data Contamination Race Condition* - Security vulnerability 
> where user data can leak between different users
> # *ISSUE #2: Performance and Memory Problems* - Scalability issues that 
> prevent handling concurrent users effectively
> *Business Impact*: These issues can lead to data integrity problems, customer 
> trust issues, system reliability problems, and operational costs due to poor 
> performance.
> *Note*: These two issues are reported together because they both stem from 
> the same architectural problems in the {code}GroovyActionDispatcher{code} and 
> will require coordinated fixes in the same code area. Addressing one issue 
> without considering the other could lead to incomplete solutions or introduce 
> new problems.
> ----
> h2. ISSUE #1: Data Contamination Race Condition
> h3. 🚨 Security Vulnerability - User Data Leakage
> *What's happening*: The system has a bug that can cause user data from one 
> customer to appear in another customer's profile or event data. This is a 
> *serious privacy and security problem*.
> *Why it matters*: 
> * Customer A's personal information could be visible to Customer B
> * This can lead to loss of customer trust and data integrity issues
> * Audit logs become unreliable
> * System behavior becomes unpredictable
> h4. Root Cause (Technical)
> The implementation uses a *shared GroovyShell instance* across all threads in 
> {code}GroovyActionDispatcher{code}, which means:
> * When multiple users are processed simultaneously, their data can get mixed 
> up
> * Thread A sets variables for User A's action/event
> * Thread B overwrites the same shared shell with User B's action/event data
> * Both threads now see User B's data, causing contamination
> h4. Race Condition Scenario (Simplified)
> # *User A* logs in and their profile is being processed
> # *User B* logs in at the same time and their profile is also being processed
> # Due to the shared system component, User A's profile gets updated with User 
> B's information
> # *Result*: User A now sees User B's data in their profile
> h3. 🔒 Security Implications
> h4. Data Integrity Violation
> * *User profile data leaks* between different user profiles
> * *Event data corruption* where event IDs get mixed up
> * *Unauthorized data disclosure* - data from one user visible to another
> * *Audit trail corruption* - incorrect data in logs
> * *Data integrity compromise* - unreliable system behavior
> h4. Example Data Contamination
> {code}
> User A's profile: { "profileId": "user-a", "actionId": "action-a", "eventId": 
> "event-a" }
> User B's profile: { "profileId": "user-b", "actionId": "action-b", "eventId": 
> "event-b" }
> After race condition:
> User A's profile: { "profileId": "user-b", "actionId": "action-b", "eventId": 
> "event-b" } // CONTAMINATED!
> User B's profile: { "profileId": "user-b", "actionId": "action-b", "eventId": 
> "event-b" }
> {code}
> h3. 📋 How to Reproduce the Data Contamination Issue
> *⚠️ Important Note*: Race conditions are inherently difficult to reproduce 
> synthetically because they depend on precise timing between threads. The 
> following steps may not consistently reproduce the issue in controlled test 
> environments, but the underlying problem exists and can manifest in 
> production under the right conditions.
> h4. Setup Environment
> {code}
> # Deploy Apache Unomi with Groovy actions extension enabled
> # Ensure multi-threaded environment with concurrent user access
> {code}
> h4. Create Race Condition Sensitive Groovy Action
> {code}
> def execute(action, event) {
>     // Read from shared variables multiple times to increase race condition 
> probability
>     def profile1 = event.getProfile()
>     def action1 = action
>     def event1 = event
>     
>     // Simulate some processing that takes time
>     Thread.sleep(2)
>     
>     // Read again - this might now be different due to race condition!
>     def profile2 = event.getProfile()
>     def action2 = action
>     def event2 = event
>     
>     def profileId1 = profile1.getProfileId()
>     def profileId2 = profile2.getProfileId()
>     def actionId1 = action1.getActionId()
>     def actionId2 = action2.getActionId()
>     def eventId1 = event1.getEventId()
>     def eventId2 = event2.getEventId()
>     
>     // Return data for verification - this will show contamination if race 
> condition occurs
>     return [
>         profileId: profileId1,
>         profileId2: profileId2,
>         actionId: actionId1,
>         actionId2: actionId2,
>         eventId: eventId1,
>         eventId2: eventId2,
>         visitCount: newValue,
>         timestamp: System.currentTimeMillis(),
>         threadId: Thread.currentThread().getId()
>     ]
> }
> {code}
> h4. Generate Concurrent Load
> {code}
> # Create a load test that simulates 1000+ concurrent users
> # Each user should trigger the Groovy action multiple times
> # Use tools like Apache JMeter, Gatling, or custom load testing scripts
> # Ensure high concurrency to increase race condition probability
> {code}
> h4. Observe Data Contamination
> * Check for profileId, actionId, and eventId mismatches in action results
> * Observe that data from one user appears in another user's result
> * Verify audit logs show incorrect data associations
> * Look for inconsistent data patterns that indicate cross-contamination
> *Note*: The race condition may not be consistently reproducible in test 
> environments due to timing dependencies. However, the architectural flaw 
> exists and can cause data contamination in production environments with real 
> user load patterns.
> h3. 🔍 Technical Evidence for Data Contamination
> h4. Code Location
> * *File*: 
> {code}src/main/java/org/apache/unomi/groovy/actions/GroovyActionDispatcher.java{code}
> * *Lines*: 67-72 (race condition in execute method)
> * *Method*: {code}execute(Action action, Event event, String actionName){code}
> * *Shared Shell*: Retrieved via 
> {code}groovyActionsService.getGroovyShell(){code}
> h4. Detection Challenges
> * *Timing-dependent*: Race conditions require specific thread interleaving 
> patterns
> * *Environment-specific*: May not manifest in development or test environments
> * *Load-dependent*: More likely under high concurrent load conditions
> * *Intermittent*: Can appear sporadically in production without consistent 
> reproduction
> h4. Service Implementation Details
> * *Shared GroovyShell*: Retrieved from GroovyActionsService as singleton 
> instance
> * *Variable Setting*: action and event variables set in shared shell binding
> * *Script Parsing*: Script parsed from shared shell instance
> * *Concurrent Access*: Multiple threads access same shell simultaneously
> * *Variable Overwriting*: Thread A's variables overwritten by Thread B's 
> variables
> h4. Race Condition Characteristics
> * *Timing-dependent*: May not be consistently reproducible in all environments
> * *Concurrency-sensitive*: More likely with higher thread counts
> * *Data-sensitive*: Affects profileId, actionId, and eventId
> * *Processing-sensitive*: More likely with longer script execution times
> * *Difficult to reproduce synthetically*: Race conditions are inherently 
> timing-dependent and may not manifest in controlled test environments
> ----
> h2. ISSUE #2: Performance and Memory Problems
> h3. 📊 Severe Performance Issues
> *What's happening*: The system performs very poorly when multiple users are 
> active at the same time, using excessive memory and processing very slowly. 
> Scripts are recompiled on every execution instead of being cached.
> *Why it matters*:
> * The system can only handle 957 operations per second (very slow for 
> production)
> * Memory usage grows to over 5GB, causing system instability
> * Performance gets worse as more users connect simultaneously
> * High garbage collection activity slows down the entire system
> * Script compilation overhead adds significant latency to every action 
> execution
> h4. Performance Problems
> # *Poor Throughput*: Only 957 executions/second (unacceptable for production)
> # *Thread Contention*: Shared GroovyShell causes blocking
> # *Memory Leaks*: 5,151MB peak memory usage
> # *High GC Pressure*: 58 garbage collections, 2,294ms GC time
> # *Poor Scalability*: Performance degrades with concurrency
> # *No Script Caching*: Scripts are recompiled on every execution instead of 
> being cached
> h3. 📊 Performance Benchmark Results
> h4. Thread Safety Test Results (1,000 threads, 50 iterations each)
> {code}
> Test Configuration:
> - Concurrent Threads: 1,000
> - Executions per Thread: 50
> - Total Executions: 50,000
> - Test Duration: 52+ seconds
> Results:
> - Throughput: 957 executions/second
> - Memory Peak: 5,155 MB
> - Memory Increase: 5,151 MB
> - GC Time: 2,294 ms
> - GC Collections: 58
> - Thread Safety: FAILED (timing-dependent)
> {code}
> h4. Memory Usage Analysis
> {code}
> Initial Memory State:
>    Used: 4 MB, Free: 64 MB, Total: 68 MB, Max: 16384 MB
> Peak Memory State:
>    Used: 5155 MB, Free: 1860 MB, Total: 7015 MB, Max: 16384 MB
> Memory Growth: 5,151 MB (128,775% increase)
> {code}
> h4. GC Activity Analysis
> {code}
> GC Statistics for Original Implementation:
>    - GC Time: 2,294 ms
>    - GC Count: 58 collections
>    - Average GC Time: 39 ms per collection
>    - GC Pressure: High (affects overall system performance)
> {code}
> h3. 🏗️ Architecture Problems
> h4. Current Architecture Issues
> {code}
> ┌─────────────────────────────────────────────────────────────┐
> │                    Current Architecture                     │
> ├─────────────────────────────────────────────────────────────┤
> │ GroovyActionDispatcher                                      │
> │ ├── Shared GroovyShell (SINGLETON) ← RACE CONDITION!        │
> │ ├── No Script Caching ← PERFORMANCE ISSUE!                  │
> │ ├── Script Recompilation on Every Execution                 │
> │ ├── Variable Overwriting in Shared Binding                  │
> │ └── Thread-Unsafe Execution                                 │
> └─────────────────────────────────────────────────────────────┘
> {code}
> h4. Performance Issues Identified
> * ❌ *Shared mutable state* across threads
> * ❌ *Variable overwriting* in shared GroovyShell binding
> * ❌ *No thread isolation* for script execution
> * ❌ *Memory inefficiency* from shared instances
> * ❌ *GC pressure* from object creation patterns
> * ❌ *No script compilation caching* - scripts recompiled on every execution
> ----
> h2. 🎯 Impact Assessment
> h3. Business Impact
> * *Data Integrity Issues*: Unreliable system behavior
> * *Customer Trust*: Loss of confidence in data security
> * *System Reliability*: Unpredictable data processing
> * *Operational Costs*: High memory usage, poor performance
> h3. Technical Impact
> * *Scalability*: Cannot handle concurrent users effectively
> * *Performance*: 957 executions/second (unacceptable)
> * *Memory*: 5,155MB peak usage leads to OOM errors
> * *Reliability*: Data corruption and inconsistencies
> * *GC Pressure*: 58 collections impact overall system performance
> h2. 📝 Additional Information
> h3. Affected Components
> * {code}GroovyActionDispatcher{code}
> * {code}GroovyActionsService{code} (provides shared GroovyShell)
> * All Groovy action executions
> * Profile and event data processing
> h3. Monitoring Recommendations
> * Monitor for profile data inconsistencies
> * Track memory usage patterns
> * Monitor GC frequency and duration
> * Alert on data contamination patterns
> * Monitor script execution times
> h3. Workarounds (Temporary)
> * Reduce concurrent event processing
> * Monitor for data contamination
> * Implement data validation checks
> * Consider disabling Groovy actions in production
> * Increase memory allocation to handle GC pressure



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to