[jira] [Comment Edited] (SPARK-54179) Add Native Support for Apache Tuple Sketches

Christopher Boumalhab (Jira) Thu, 12 Feb 2026 07:17:41 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-54179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058166#comment-18058166
 ]


Christopher Boumalhab edited comment on SPARK-54179 at 2/12/26 3:16 PM:
------------------------------------------------------------------------

[~chengpan] This makes sense, maybe we can propose a non-FFM 
([https://openjdk.org/jeps/412]) Java-17 compiled compatibility build as a 
solution to this or just remove java version checks if that is the only 
blocker. Please let me know if there is anything I can do on my end. Open to 
creating a ticket in that project if needed.


was (Author: JIRAUSER309943):
[~chengpan] This makes sense, maybe we can propose a non-FFM 
(https://openjdk.org/jeps/412) Java-17 compiled compatibility build as a 
solution to this. Please let me know if there is anything I can do on my end. 
Open to creating a ticket in that project if needed.

> Add Native Support for Apache Tuple Sketches
> --------------------------------------------
>
>                 Key: SPARK-54179
>                 URL: https://issues.apache.org/jira/browse/SPARK-54179
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Christopher Boumalhab
>            Assignee: Christopher Boumalhab
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0
>
>
> Implement support for tuple sketches in Apache Spark to enable approximate 
> set cardinality, frequency, and similarity computations over multiple 
> dimensions efficiently. The feature should:
>  * Integrate tuple sketches with Spark’s DataFrame and RDD APIs.
>  * Provide functions for creating, updating, and querying tuple sketches.
>  * Support common sketch operations such as union, intersection, and 
> cardinality estimation.
>  * Ensure compatibility with Spark SQL and allow usage within DataFrame 
> transformations and aggregations.
>  * Include unit and integration tests validating accuracy and performance.
>  * Provide documentation and examples for developers.
> *Acceptance Criteria:*
> 1. Sketches support aggregation and merging operations.
> 2. Queries return approximate cardinalities or other statistics with expected 
> error bounds.
> 3. Performance benchmarks show scalability for large datasets.
> 4. Documentation includes API usage examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-54179) Add Native Support for Apache Tuple Sketches

Reply via email to