[ 
https://issues.apache.org/jira/browse/SPARK-54179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039722#comment-18039722
 ] 

Daniel Tenedorio commented on SPARK-54179:
------------------------------------------

Nice

> Add Native Support for Apache Tuple Sketches
> --------------------------------------------
>
>                 Key: SPARK-54179
>                 URL: https://issues.apache.org/jira/browse/SPARK-54179
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Christopher Boumalhab
>            Priority: Major
>              Labels: pull-request-available
>
> Implement support for tuple sketches in Apache Spark to enable approximate 
> set cardinality, frequency, and similarity computations over multiple 
> dimensions efficiently. The feature should:
>  * Integrate tuple sketches with Spark’s DataFrame and RDD APIs.
>  * Provide functions for creating, updating, and querying tuple sketches.
>  * Support common sketch operations such as union, intersection, and 
> cardinality estimation.
>  * Ensure compatibility with Spark SQL and allow usage within DataFrame 
> transformations and aggregations.
>  * Include unit and integration tests validating accuracy and performance.
>  * Provide documentation and examples for developers.
> *Acceptance Criteria:*
> 1. Sketches support aggregation and merging operations.
> 2. Queries return approximate cardinalities or other statistics with expected 
> error bounds.
> 3. Performance benchmarks show scalability for large datasets.
> 4. Documentation includes API usage examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to