Prajwal-banakar commented on issue #2659:
URL: https://github.com/apache/fluss/issues/2659#issuecomment-3889349561

   Hi @wuchong,
   
   I plan to demonstrate these features through a 'Real-Time User Identity & 
Activity Tracker' use case. This scenario reflects a common business need: 
mapping high-cardinality business keys (like emails) to compact system IDs 
while maintaining real-time metrics.
   
   The plan for the tutorial includes:
   
   Identity Mapping (Auto-Increment): Creating a 'Dictionary Table' where 
incoming email strings from a raw stream are automatically mapped to a unique 
BIGINT UID. This highlights the feature’s role as a global ID generator that 
replaces manual, complex ID management.
   
   Stateless Aggregation (Merge Engine): Creating a 'User Profile' table that 
uses the aggregation merge engine to track total_spend (sum) and last_active 
(max) directly in the storage layer. This demonstrates how Fluss allows Flink 
jobs to remain lightweight and nearly stateless.
   
   End-to-End Pipeline: Showing a single Flink SQL query using the 
lookup.insert-if-not-exists hint. This is the 'magic' that ties it together, 
showing how a single join can look up an ID, create it if it's missing, and 
trigger the downstream aggregation.
   
   Production Reliability: A brief section on Undo Recovery, explaining how 
Fluss ensures exactly-once accuracy for these aggregates during Flink failovers.
   
   Does this approach align with the project goals for the docs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to