[
https://issues.apache.org/jira/browse/SPARK-54179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058166#comment-18058166
]
Christopher Boumalhab edited comment on SPARK-54179 at 2/12/26 3:16 PM:
------------------------------------------------------------------------
[~chengpan] This makes sense, maybe we can propose a non-FFM
([https://openjdk.org/jeps/412]) Java-17 compiled compatibility build as a
solution to this or just remove java version checks if that is the only
blocker. Please let me know if there is anything I can do on my end. Open to
creating a ticket in that project if needed.
was (Author: JIRAUSER309943):
[~chengpan] This makes sense, maybe we can propose a non-FFM
(https://openjdk.org/jeps/412) Java-17 compiled compatibility build as a
solution to this. Please let me know if there is anything I can do on my end.
Open to creating a ticket in that project if needed.
> Add Native Support for Apache Tuple Sketches
> --------------------------------------------
>
> Key: SPARK-54179
> URL: https://issues.apache.org/jira/browse/SPARK-54179
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Christopher Boumalhab
> Assignee: Christopher Boumalhab
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.2.0
>
>
> Implement support for tuple sketches in Apache Spark to enable approximate
> set cardinality, frequency, and similarity computations over multiple
> dimensions efficiently. The feature should:
> * Integrate tuple sketches with Spark’s DataFrame and RDD APIs.
> * Provide functions for creating, updating, and querying tuple sketches.
> * Support common sketch operations such as union, intersection, and
> cardinality estimation.
> * Ensure compatibility with Spark SQL and allow usage within DataFrame
> transformations and aggregations.
> * Include unit and integration tests validating accuracy and performance.
> * Provide documentation and examples for developers.
> *Acceptance Criteria:*
> 1. Sketches support aggregation and merging operations.
> 2. Queries return approximate cardinalities or other statistics with expected
> error bounds.
> 3. Performance benchmarks show scalability for large datasets.
> 4. Documentation includes API usage examples.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]