[
https://issues.apache.org/jira/browse/SPARK-54179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039722#comment-18039722
]
Daniel Tenedorio commented on SPARK-54179:
------------------------------------------
Nice
> Add Native Support for Apache Tuple Sketches
> --------------------------------------------
>
> Key: SPARK-54179
> URL: https://issues.apache.org/jira/browse/SPARK-54179
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Christopher Boumalhab
> Priority: Major
> Labels: pull-request-available
>
> Implement support for tuple sketches in Apache Spark to enable approximate
> set cardinality, frequency, and similarity computations over multiple
> dimensions efficiently. The feature should:
> * Integrate tuple sketches with Spark’s DataFrame and RDD APIs.
> * Provide functions for creating, updating, and querying tuple sketches.
> * Support common sketch operations such as union, intersection, and
> cardinality estimation.
> * Ensure compatibility with Spark SQL and allow usage within DataFrame
> transformations and aggregations.
> * Include unit and integration tests validating accuracy and performance.
> * Provide documentation and examples for developers.
> *Acceptance Criteria:*
> 1. Sketches support aggregation and merging operations.
> 2. Queries return approximate cardinalities or other statistics with expected
> error bounds.
> 3. Performance benchmarks show scalability for large datasets.
> 4. Documentation includes API usage examples.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]