Hi Beam community, I’m drafting a GSoC 2026 proposal based on the existing Beam idea "A learning path to using accelerators with Beam" (mentor: Pablo Estrada). I’d love to share my implementation draft and ask for any technical feedback from the community.
Draft doc: https://docs.google.com/document/d/1XvgS9k3ErjdrXdID-aCpDG28g4ylFEb4/edit Summary (aligned with the idea): A progressive set of examples that builds from a local CPU baseline to Dataflow GPU speedups, then accelerator-backed training (GPU/TPU), and finally parallel training orchestration (e.g., sweeps), plus a short guide and lightweight CI/smoke tests to keep the examples fresh. A few specific prompts to get discussion started: Q1: Does the staged progression (CPU -> GPU -> TPU training -> parallel sweeps) feel like the right "learning path" for Beam ML users? Q2: Any concerns with using --worker_accelerator for provisioning while using resource_hints to annotate accelerator-benefiting transforms? Q3: Is the proposed "continuous freshness" approach (nightly mocks + periodic Dataflow smoke runs) reasonable for examples in the Beam repo? Thanks in advance for any thoughts or pointers to existing patterns/docs I should align with. Best regards, Elia
