[I] [Tracking Umbrella] Prism Runner areas for contribution. [beam]

via GitHub Wed, 06 Dec 2023 16:38:29 -0800


lostluck opened a new issue, #29650:
URL: https://github.com/apache/beam/issues/29650


   ### What needs to happen?
   
   This issue is to track and refer to other issues/prs for various prism 
features. This issue shouldn't generally be commented on, but have this top 
entry edited as needed, referring to granular issues for individual features 
and support.
   
   Ultimately, this will eventually track support in the [Beam Compatibility 
Matrix](https://beam.apache.org/documentation/runners/capability-matrix/), and 
keeping the [Prism 
README](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/README.md)
 up to date.
   
   Incomplete items are unchecked with either (NeedsIssue), or a link to their 
primary tracking issue. Complete items should be checked, and have links to 
their completing PR or closed primary tracking issue.
   
   Items marked (NeedsIssue) should only have an issue filed when the work has 
started, typically there's a meaningful design proposal, and understanding of 
what the closing criteria are. This can be "X set of existing SDK tests now 
pass", or a given capability is possible (eg. UI related features.)
   
   # Prism Areas for Contribution
   
   ## Beam Core Priorities 
   
   These are features that prevent prism use and adoption.
   
   In progress by @lostluck 
   
   - [ ] State Handling (#28543)
   - [ ] Timer Handling (NeedsIssue)
   - [ ] TestStream (NeedsIssue) 
   - [ ] Triggers (NeedsIssue)
   
   ## Other Beam Core
   
   - [ ] Multi-Chunk Iterable protocol (#27762)
   - [ ] State Backed Iterables for both State and GBKs) (NeedsIssue)
   
   ## Non-Go Blockers 
   
   Notable issues found in trying to run the Non Go SDKs (Java, Python, or 
others). Tracked in #28187, and more granular issues should be referred to here.
   
   - [ ] Go SDK Cross Language PostCommit Suite (*find issue/pr*)
   - [ ] Prism Java Validates Runner Suite (NeedsIssue)
     - [ ] Executing targets exists
     - [ ] No tests filtered.
   - [ ] Python Validates Runner Suite (NeedsIssue)
     - [ ] Executing targets exists
     - [ ] No tests filtered.
   - [ ] Properly respect and handle SDK & Runner Capabilities. (NeedsIssue)
   
   
   ## Persistence & Reliability Features
   
   Prism currently stores everything in memory. This includes all element data, 
in progress bundle data, pipeline info, artifacts etc. This is fast, but not 
the best use of memory for using prism long term as a stand alone runner.
   
   - [ ] Per Pipeline data should be moved to a local file cache. (NeedsIssue)
     * They aren’t stored in memory when not needed. Eg. Artifacts shouldn’t 
live in memory once necessary environments are spun up.
   - [ ] Pipeline Restarts (NeedsIssue)
     - [ ] Optimized stages need to be stored, so no complex mapping needs to 
occur for any persisted state. (NeedsIssue)
     - [ ] Per stage pending elements and state needs to be stored so bundles 
can be re-computed on restarts. (NeedsIssue) 
        * It should be possible for a pipeline to be aborted, and prism torn 
down, and for a pipeline to be restarted from where it left off, with new 
worker processes.
   - [ ] Bundles Retries (NeedsIssue)
     * Prism currently doesn’t retry failed bundles. A bundle failure fails the 
pipeline.
     * Adding a sensible retries policy would improve bundle reliability.
     * Affects how elements are divided into bundles, and scheduled.
     * Eg. A failed bundle could be split into smaller and smaller bundles, 
until the failing elements are isolated. Such a strategy would also enable 
implementation of error tolerance policies for example.
   - [ ] Improve Bundle Splitting (NeedsIssue)
     * Prism currently schedules all available pending elements into a single 
bundle.
     * Instead it could use some heuristic to determine how to split pending 
elements into new bundles to improve worker level parallelism before Channel or 
Sub Element Splitting occurs.
   - [ ] Programmatic Cancel, and Drain (NeedsIssue)
     * Use the FnAPI to allow a pipeline to Cancel, or Drain.
     * Hook up this ability in the UI.
     * Drain in particular would be useful as it could then allow user side 
Drain code to be tested and validated.
   - [ ] Pipeline Update (NeedsIssue)
    * Similar to Cancel + Drain in combination with Pipeline restarts. Allow a 
pipeline to be updated mid execution.
    
   ## Performance features
   
   These are non-user facing Beam features that Dataflow implements. In order 
for Prism to serve the purpose of validating pipeline locally before production 
runner execution, these are required, to reduce worker side execution 
differences.
   
   - [ ] Side Input + State Cache (NeedsIssue)
   - [ ] Elements on ProcessBundleRequest (NeedsIssue)
   - [ ] Elements on ProcessBundleResponse (NeedsIssue)
   - [ ] Autosharded keys? (NeedsIssue)
   - [ ] Side Input + State Cache (NeedsIssue)
   - [ ] Map Side Input Keys (NeedsIssue)
   
   ## Stand Alone UI Based Features
   
   These are features that are best tied to the ability to understand a job in 
the UI. 
   
   - [ ] Data Sampling + plumbing to UI (NeedsIssue)
   - [ ] Worker Status support + plumbing to UI (NeedsIssue)
   - [ ] Runner side PubSub Transform (due to being a Beam built in) 
(NeedsIssue)
   - [ ] Display of Optimized stages in UI (NeedsIssue)
   - [ ] Display of Graph structure in UI (NeedsIssue)
     - [ ] Interactivity with same. (NeedsIssue)
   - [ ] Display of Job Logs in UI (NeedsIssue)
     * ...and storage thereof in local cache. (NeedsIssue)
    
   ## Other features
   
   The following are known issues/desires without a specific categorization at 
present.
   
   - [ ] Custom WindowFns (NeedsIssue)
   - [ ] Prism Per Job Configurability (NeedsIssue)
     * Being able to toggle or set specific configurations using 
PipelineOptions or similar. 
     * AKA the described Variants approach. (NeedsIssue)
   
   # Completed Work
   
   This section should be structured similarly to the [Beam Compatibility 
Matrix](https://beam.apache.org/documentation/runners/capability-matrix/) for 
ease of transition to populating it there.
   
   * Environment Execution
     - [x] LOOPBACK/External
     - [x] Docker
   * Basics
     - [x] DoFns
     - [x] GBKS
     - [x] Windows
     - [x] Side Inputs
       - [x] Map
       - [x] Iterator 
   * Scaling
     - [x] Splittable DoFn support
     - [x] ProcessContinuation support
   * Performance
     - [x] Fusion 
   
   ### Issue Priority
   
   Priority: 2 (default / most normal work should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [ ] Component: Java SDK
   - [X] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Tracking Umbrella] Prism Runner areas for contribution. [beam]

Reply via email to