Prajwal-banakar opened a new pull request, #2669:
URL: https://github.com/apache/fluss/pull/2669

   <!--
   *Thank you very much for contributing to Fluss - we are happy that you want 
to help us improve Fluss. To help the community review your contribution in the 
best possible way, please go through the checklist below, which will get the 
contribution into a shape in which it can be best reviewed.*
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GitHub 
issue](https://github.com/apache/fluss/issues). Exceptions are made for typos 
in JavaDoc or documentation files, which need no issue.
   
     - Name the pull request in the format "[component] Title of the pull 
request", where *[component]* should be replaced by the name of the component 
being changed. Typically, this corresponds to the component label assigned to 
the issue (e.g., [kv], [log], [client], [flink]). Skip *[component]* if you are 
unsure about which is the best component.
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
   
     - Make sure that the change passes the automated tests, i.e., `mvn clean 
verify` passes.
   
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
   
   
   **(The sections below can be removed for hotfixes or typos)**
   -->
   
   ### Purpose
   
   <!-- Linking this pull request to the issue -->
   Linked issue: close #2659
   
   <!-- What is the purpose of the change -->
   The purpose of this change is to add a new quickstart tutorial, "Real-Time 
User Profile," to the Apache Fluss documentation. This tutorial demonstrates a 
realistic, production-grade business scenario by combining the Auto-Increment 
Column and Aggregation Merge Engine features. It specifically addresses the 
need for guidance on mapping high-cardinality string identifiers to compact 
integers for efficient real-time analytics.
   
   ### Brief change log
   
   <!-- Please describe the changes made in this pull request and explain how 
they address the issue -->
   This pull request introduces a comprehensive tutorial located at 
website/docs/quickstartUuser-Profile.md. Key changes include:
   
   Realistic Use Case: Developed a scenario focused on identity mapping (Email 
to UID) and real-time metric aggregation (Total Clicks and Unique Visitors).
   
   Feature Integration: Showcases the synergy between FIP-16 (Auto-Increment) 
for dictionary management and FIP-21 (Aggregation Merge Engine) for 
storage-level pre-aggregation.
   
   Technical Optimization: Implemented the maintainer's recommendation to use 
INT for the generated uid column to maximize storage efficiency and performance 
for RoaringBitmap (rbm64) operations.
   
   Reliability Section: Added documentation on Undo Recovery to explain how 
Fluss ensures exactly-once accuracy for aggregations during Flink failovers.
   
   Visual Guidance: Included an architectural diagram to illustrate the data 
flow from raw event ingestion to the final pre-aggregated profile storage.
   
   ### Tests
   
   <!-- List UT and IT cases to verify this change -->
   Documentation Build: Verified that the documentation builds correctly using 
the local Docusaurus environment and that the new page is correctly linked in 
the sidebar.
   
   SQL Verification: Manually verified the Flink SQL syntax against the Apache 
Fluss 0.9 connector specifications.
   
   <img width="1920" height="1080" alt="image" 
src="https://github.com/user-attachments/assets/2a4da632-ca1f-48cc-9161-0a68f940936d";
 />
   
   
   ### API and Format
   
   <!-- Does this change affect API or storage format -->
   This change is documentation-only and does not affect the Java API or the 
underlying storage format.
   
   ### Documentation
   
   <!-- Does this change introduce a new feature -->
   Yes, this change introduces a new documentation feature (a new quickstart 
tutorial) aimed at guiding users through the adoption of Fluss's advanced 
streaming storage capabilities.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to