Makes sense, I missed that part. That is why a generic "inlining" scheme
is problematic, because that depends on how does the runner encodes the
elements on the wire. And it is where TestStream's output needs to be
encoded into raw bytes, because the wire coder is unknown the the SDK
when submitting the job.
Thanks for the clarification and for bearing with me!
Jan
On 9/7/21 7:55 PM, Robert Bradshaw wrote:
On Mon, Sep 6, 2021 at 1:29 AM Jan Lukavský <[email protected]> wrote:
It is currently the latter for runners using this code (which not all
do, e.g. the ULR and Dataflow runners). I don't think we want to
ossify this decision as part of the spec. (Note that even what's
"known" and "unknown" can change from runner to runner.)
This is interesting and unexpected for me. How do runners decide about
how they encode elements between SDK harness and the runner? How do they
inform the SDK harness about this decision? My impression was that this
is well-defined at the model level. If not, then we have the reason for
misunderstanding in this conversation. :-)
The coder id to use for a channel is specified by the runner on the
channel (both the input and output) operation when sending a process
bundle descriptor. They decide based on their capabilities (e.g. what
coders they understand vs. what needs wrapping) SDK capabilities (e.g.
for optimizations like param windowed value coder) and their needs
(e.g. to do a GBK, they need the key and value bytes, not just the
key-value bytes).