I've Implemented some changes to make VoidCoder portable, please take a look when you get some time. PR: https://github.com/apache/beam/pull/38937
On Wed, Jun 10, 2026 at 10:13 PM Robert Bradshaw <[email protected]> wrote: > If it was made into a portable coder that would expand the set of > transforms that would be usable outside of Java though. > > On Wed, Jun 10, 2026 at 8:55 AM Ganesh Sivakumar < > [email protected]> wrote: > >> Thanks, I got it, It's a sdk specific coder. >> >> On Tue, 9 Jun, 2026, 9:57 pm Robert Bradshaw via dev, < >> [email protected]> wrote: >> >>> On Tue, Jun 9, 2026 at 6:36 AM Ganesh Sivakumar < >>> [email protected]> wrote: >>> >>>> I had worked on coders implementation and started testing java >>>> pipelines on the runner. Noticed something interesting, Typically Pipeline >>>> object contain map of coder id and the Coder object like: >>>> ``` >>>> "ByteArrayCoder": Coder { >>>> spec: Some( >>>> FunctionSpec { >>>> urn: "beam:coder:bytes:v1", >>>> payload: [], >>>> }, >>>> ), >>>> component_coder_ids: [], >>>> }, >>>> `` >>>> Which makes perfect sense for the runner to use the coder_id of >>>> pcollection to get the right coder implementation based on urn and >>>> encode/decode when that pcollection arrives. But my pipeline had a dofn >>>> that gets the string input and prints it, it's the end of pipeline and dofn >>>> returns void. DoFn<String,Void>. Coder for this looked like : >>>> >>>> ``` >>>> "VoidCoder": Coder { >>>> spec: Some( >>>> FunctionSpec { >>>> urn: "beam:coders:javasdk:0.1", >>>> payload: [ >>>> ``` >>>> I understand VoidCoder is for handling empty values, basically do >>>> nothing when we receive the coder? But why doesn't it have its own urn >>>> like others, what's the purpose of beam:coders:javasdk:0.1. >>>> >>> >>> This is because the VoidCoder was never "made portable" in the sense of >>> becoming a transparent Coder with a well-defined spec suitable for >>> traversing cross-language boundaries. "beam:coders:javasdk:0.1" means "a >>> java specific coder, defined by its java serialized bytes" and is what's >>> used user-defined coders. >>> >>> It would certainly make sense to update VoidCoder to a true >>> cross-language coder. >>> >>> Since it's a dofn,if worker executes it and output pcol is void, worker >>>> sends no elements to runner in data channel? >>>> >>> >>> As an aside, IIRC, the void coder might send a single byte (e.g. so one >>> could send a list of exactly N voids) that carry no semantic meaning. Worth >>> checking the implementation. >>> >>> >>>> On Wed, May 20, 2026 at 5:26 PM Ganesh Sivakumar < >>>> [email protected]> wrote: >>>> >>>>> Thanks, This is a really valuable reference. I'm working on coders >>>>> Rust implementation, will reach out if there are any questions. >>>>> >>>>> On Tue, May 19, 2026 at 7:11 PM Robert Burke <[email protected]> >>>>> wrote: >>>>> >>>>>> Just a quick reply >>>>>> >>>>>> You're right that you use the Beam coders to interpret the bytes. You >>>>>> just need an implementation of them in rust (and in every language >>>>>> building >>>>>> Beam components). >>>>>> >>>>>> For the Prism runner (written in Go, default for Python and Go) we >>>>>> use the Go SDK coder implementations, because they were already present. >>>>>> But not every thing makes sense to use directly from the SDK within a >>>>>> runner context. >>>>>> >>>>>> >>>>>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/coders.go >>>>>> >>>>>> For example, for timers, it made more sense to reimplement certain >>>>>> portions within the engine portion of prism, than to route towards SDK >>>>>> constructs. >>>>>> >>>>>> >>>>>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/engine/timers.go#L135 >>>>>> >>>>>> >>>>>> I wrote up the flow that Prism uses for managing bundles here. There >>>>>> are flowcharts that provide the broad strokes, and I hope they are useful >>>>>> to someone building their own runner. >>>>>> >>>>>> >>>>>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/README.md >>>>>> >>>>>> >>>>>> Finally, while the Beam Pipeline Protos and runner APIs dictate how >>>>>> things communicate between runners and SDKs. The rest is up to you. >>>>>> I have a languishing "hobby" Go SDK that uses a different approach to >>>>>> coders to make them easier to deal with and manipulate, vs just the >>>>>> bytestream approach. >>>>>> >>>>>> https://github.com/lostluck/beam-go/blob/main/coders/coders.go >>>>>> >>>>>> It mostly wraps a byte buffer, and then allows callers to pop or push >>>>>> values to it. But this ends up playing very well for the garbage >>>>>> collector >>>>>> in Go. >>>>>> >>>>>> Rust has different constraints and problems to solve, so don't feel >>>>>> constrained by how the other languages do it. >>>>>> >>>>>> Hope this helps, and let me know if you have questions. >>>>>> Robert Burke >>>>>> >>>>>> On Tue, May 19, 2026, 5:29 AM Ganesh Sivakumar < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hey Everyone, >>>>>>> >>>>>>> I am working on a new Rust based portable Beam Runner and I'm at >>>>>>> pipeline execution phase where the Rust runner side needs to >>>>>>> communicate with the worker sdk harness(Java) and execute the >>>>>>> stages( stages are nothing but a set of fused transforms, formed using >>>>>>> the >>>>>>> greedy fusion approach from Beam Java utils, rewritten in Rust for the >>>>>>> runner) >>>>>>> >>>>>>> For the stage to run on a worker, the runner needs to register the >>>>>>> stage information with the worker and then send a run request via grpc >>>>>>> channels. The worker will execute the transforms in the stage and send >>>>>>> the >>>>>>> output back to runner via data grpc channel in the form of Elements [1] >>>>>>> Elements contain the output data as raw bytes which the runner needs to >>>>>>> decode to get the actual data like String, Int or POJO. Typically other >>>>>>> runners do it with Beam's defined coders for encoding and decoding. But >>>>>>> for >>>>>>> Rust there isn't Beam coders implementation. Curious if anyone >>>>>>> previously >>>>>>> worked on coders for cross languages, or similar things, and how did you >>>>>>> implement in general. >>>>>>> >>>>>>> Thanks, >>>>>>> Ganesh. >>>>>>> >>>>>>> [1] - >>>>>>> https://github.com/apache/beam/blob/55eb624e5cd00e546ab19fc411281a0e5f596142/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L724 >>>>>>> >>>>>>
