Danny McCormick created GSOC-259:
------------------------------------

             Summary: [GSOC][Beam] Build out Beam Use Cases
                 Key: GSOC-259
                 URL: https://issues.apache.org/jira/browse/GSOC-259
             Project: Comdev GSOC
          Issue Type: Task
            Reporter: Danny McCormick


Apache Beam is a unified model for defining both batch and streaming 
data-parallel processing pipelines, as well as a set of language-specific SDKs 
for constructing pipelines and Runners for executing them on distributed 
processing backends. On top of providing lower level primitives, Beam has also 
introduced several higher level transforms used for machine learning and some 
general data processing use cases. This project focuses on identifying and 
implementing real world use cases that use these transforms

Objectives:
1. Add real world use cases demonstrating Beam's MLTransform for preprocessing 
data and generating embeddings
2. Add real world use cases demonstrating Beam's Enrichment transform for 
enriching existing data with data from a slowly changing source.
3. (Stretch) Implement 1 or more additional "enrichment handlers" for 
interacting with currently unsupported sources

Useful links:
Apache Beam repo - [https://github.com/apache/beam]
MLTransform docs - 
[https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/|https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml]
Enrichment code - 
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/enrichment.py]
Enrichment docs (should be published soon) - 
[https://github.com/apache/beam/pull/30187]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: gsoc-unsubscr...@community.apache.org
For additional commands, e-mail: gsoc-h...@community.apache.org

Reply via email to