Prajwal-banakar opened a new pull request, #3319:
URL: https://github.com/apache/fluss/pull/3319

   
   
   <!--
   *Thank you very much for contributing to Fluss - we are happy that you want 
to help us improve Fluss. To help the community review your contribution in the 
best possible way, please go through the checklist below, which will get the 
contribution into a shape in which it can be best reviewed.*
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GitHub 
issue](https://github.com/apache/fluss/issues). Exceptions are made for typos 
in JavaDoc or documentation files, which need no issue.
   
     - Name the pull request in the format "[component] Title of the pull 
request", where *[component]* should be replaced by the name of the component 
being changed. Typically, this corresponds to the component label assigned to 
the issue (e.g., [kv], [log], [client], [flink]). Skip *[component]* if you are 
unsure about which is the best component.
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
   
     - Make sure that the change passes the automated tests, i.e., `mvn clean 
verify` passes.
   
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
   
     - **Generative AI disclosure:** Indicate whether generative AI tools were 
used in authoring this PR. If yes, specify the tool below.
       - [ ] No generative AI tools used
       - [ ] Yes (please specify the tool below)
   
   **(The sections below can be removed for hotfixes or typos)**
   -->
   
   <!--
   Generated-by: [Tool Name and Version] following [the 
guidelines](https://github.com/apache/fluss/blob/main/AGENTS.md)
   -->
   
   ### Purpose
   
   <!-- Linking this pull request to the issue -->
   
   Linked issue: close #3289
   
   <!-- What is the purpose of the change -->
   
   This PR adds the foundational infrastructure for FIP-37 RoaringBitmap SQL 
function implementation. It provides the serialization utilities, custom Flink 
type serializer, and base aggregate function class that will be used by the 
bitmap SQL functions (`rb_build_agg`, `rb_or_agg`, `rb_and_agg`, etc.) in 
subsequent PRs.
   
   ### Brief change log
   
   <!-- Please describe the changes made in this pull request and explain how 
they address the issue -->
   Added the following infrastructure files in `fluss-flink/fluss-flink-common`:
   
   - **BitmapUtils.java**: Utility methods for serializing/deserializing 
`RoaringBitmap` using the ByteBuffer-based approach, which matches the 
server-side `RoaringBitmapUtils.serializeRoaringBitmap32` format used by 
`FieldRoaringBitmap32Agg` for wire compatibility
   - **RoaringBitmapSerializer.java**: Custom Flink `TypeSerializer` for 
`RoaringBitmap` accumulators to ensure correct checkpoint/savepoint behavior. 
Without this, Flink falls back to Kryo which is sensitive to internal class 
layout changes across RoaringBitmap library versions
   - **RoaringBitmapTypeInfo.java**: `TypeInformation` wrapper that provides 
the custom serializer to Flink's type system
   - **AbstractRbAggFunction.java**: Base class for bitmap aggregate UDFs with 
`@FunctionHint(accumulator = @DataTypeHint(value = "RAW", bridgedTo = 
RoaringBitmap.class))` annotation. This tells Flink's Table planner to skip 
reflection-based POJO field extraction on RoaringBitmap and use the custom 
`TypeInformation` instead
   - **BitmapUtilsTest.java**: Unit tests covering null handling, empty bitmap, 
known values round-trip, large cardinality (100K elements), and server 
serialization compatibility
   - **pom.xml**: Added `RoaringBitmap` dependency (version 1.3.0 from root pom)
   
   The aggregate functions (`rb_build_agg`, `rb_or_agg`, `rb_and_agg`) and 
catalog registration will follow in subsequent PRs linked to this issue.
   
   ### Tests
   
   <!-- List UT and IT cases to verify this change -->
   Unit tests added and passing:
   - `BitmapUtilsTest.testNullInputToBytes()` - null handling
   - `BitmapUtilsTest.testNullInputFromBytes()` - null handling  
   - `BitmapUtilsTest.testEmptyBitmapRoundTrip()` - empty bitmap serialization
   - `BitmapUtilsTest.testKnownValuesRoundTrip()` - correctness with known 
values
   - `BitmapUtilsTest.testLargeCardinality()` - performance with 100K elements
   - `BitmapUtilsTest.testFormatCompatibleWithServerSerialization()` - wire 
compatibility
   
   All tests pass: `Tests run: 6, Failures: 0, Errors: 0, Skipped: 0`
   
   Verified with:
   - `./mvnw spotless:apply -pl fluss-flink/fluss-flink-common` - BUILD SUCCESS
   - `./mvnw test -pl fluss-flink/fluss-flink-common -Dtest=BitmapUtilsTest` - 
BUILD SUCCESS
   - `./mvnw clean install -pl fluss-flink/fluss-flink-common -DskipTests` - 
BUILD SUCCESS
   - ` ./mvnw clean package -DskipTests` (full project build) - BUILD SUCCESS
   - Checkstyle: 0 violations
   
   ### API and Format
   
   <!-- Does this change affect API or storage format -->
   This change does not affect any public API or storage format. It adds 
internal infrastructure utilities that will be used by future bitmap SQL 
functions.
   
   ### Documentation
   
   <!-- Does this change introduce a new feature -->
   This change does not introduce new user-facing features yet. The bitmap SQL 
functions (`rb_build_agg`, `rb_or_agg`, `rb_and_agg`) and their documentation 
will be added in follow-up PRs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to