[ https://issues.apache.org/jira/browse/GSOC-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maxim Solodovnik updated GSOC-259: ---------------------------------- Labels: Beam gsoc gsoc2024 (was: beam gsoc gsoc2024) > [GSOC][Beam] Build out Beam Use Cases > ------------------------------------- > > Key: GSOC-259 > URL: https://issues.apache.org/jira/browse/GSOC-259 > Project: Comdev GSOC > Issue Type: New Feature > Reporter: Danny McCormick > Priority: Major > Labels: Beam, gsoc, gsoc2024 > > Apache Beam is a unified model for defining both batch and streaming > data-parallel processing pipelines, as well as a set of language-specific > SDKs for constructing pipelines and Runners for executing them on distributed > processing backends. On top of providing lower level primitives, Beam has > also introduced several higher level transforms used for machine learning and > some general data processing use cases. This project focuses on identifying > and implementing real world use cases that use these transforms > Objectives: > 1. Add real world use cases demonstrating Beam's MLTransform for > preprocessing data and generating embeddings > 2. Add real world use cases demonstrating Beam's Enrichment transform for > enriching existing data with data from a slowly changing source. > 3. (Stretch) Implement 1 or more additional "enrichment handlers" for > interacting with currently unsupported sources > Useful links: > Apache Beam repo - [https://github.com/apache/beam] > MLTransform docs - > [https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/|https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml] > Enrichment code - > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/enrichment.py] > Enrichment docs (should be published soon) - > [https://github.com/apache/beam/pull/30187] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: gsoc-unsubscr...@community.apache.org For additional commands, e-mail: gsoc-h...@community.apache.org