[ 
https://issues.apache.org/jira/browse/GSOC-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny McCormick updated GSOC-259:
---------------------------------
    Issue Type: New Feature  (was: Task)

> [GSOC][Beam] Build out Beam Use Cases
> -------------------------------------
>
>                 Key: GSOC-259
>                 URL: https://issues.apache.org/jira/browse/GSOC-259
>             Project: Comdev GSOC
>          Issue Type: New Feature
>            Reporter: Danny McCormick
>            Priority: Major
>              Labels: beam, gsoc, gsoc2024
>
> Apache Beam is a unified model for defining both batch and streaming 
> data-parallel processing pipelines, as well as a set of language-specific 
> SDKs for constructing pipelines and Runners for executing them on distributed 
> processing backends. On top of providing lower level primitives, Beam has 
> also introduced several higher level transforms used for machine learning and 
> some general data processing use cases. This project focuses on identifying 
> and implementing real world use cases that use these transforms
> Objectives:
> 1. Add real world use cases demonstrating Beam's MLTransform for 
> preprocessing data and generating embeddings
> 2. Add real world use cases demonstrating Beam's Enrichment transform for 
> enriching existing data with data from a slowly changing source.
> 3. (Stretch) Implement 1 or more additional "enrichment handlers" for 
> interacting with currently unsupported sources
> Useful links:
> Apache Beam repo - [https://github.com/apache/beam]
> MLTransform docs - 
> [https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/|https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml]
> Enrichment code - 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/enrichment.py]
> Enrichment docs (should be published soon) - 
> [https://github.com/apache/beam/pull/30187]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: gsoc-unsubscr...@community.apache.org
For additional commands, e-mail: gsoc-h...@community.apache.org

Reply via email to