It makes sense to have a more concrete URN including the version. Good idea Robert.
Regards JB On 05/11/2018 16:52, Robert Bradshaw wrote: > I think we'll want to allow upgrades across SDK versions. A runner > should be able to recognize when a coder (or any other aspect of the > pipeline) has changed and adapt/reject accordingly. (Until we remove > coders from sources/sinks, there's also possibly the expectation that > one should be able to read data from a source written with that same > coder across versions as well.) > > I think it really comes down to how coders are named. If we decide to > let coders change arbitrarily between versions, probably the URN for > SerializedJavaCoder should have the SDK version number in it. Coders > that are stable across SDKs can have better, more stable URNs defined > and registered. > > I am more OK with changing the registry to infer different coders as > the SDK evolves (which would be detected and manually overwritten with > the old ones, on a case-by-case basis, if they still exist). This > should still be done with caution as it will make upgrading harder. > Highly composite, experimental coders should possibly be designed in > an intrinsically extensible way. > > On Mon, Nov 5, 2018 at 4:24 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: >> >> That's really a pita. It's an important and impacting change. >> >> I would go to 1. >> >> For LTS, as already said, I would create a LTS branch and only cherry >> pick some changes. Using master as LTS release branch won't work IMHO. >> >> Regards >> JB >> >> On 05/11/2018 15:47, Ismaël Mejía wrote: >>> For some extra context this change touches more than FileIO, in >>> reality this will affect updates in any file-based pipelines because >>> the metadata on each file will have now an extra field for the >>> lastModifiedDate. >>> >>> The PR looks perfect, only issue is the backwards compatibility Coder >>> question. Knowing that probably Dataflow is the only one affected, I >>> would like to know what can we do? >>> >>> [1] Should we merge and the Coder updatability be tied to SDK versions >>> (which makes sense and is probably more aligned with the LTS >>> discussion)? >>> [2] Should we have a MetadataCoderV2? (does this imply a repeated >>> Matadata object) ? In this case where is the right place to identify >>> and decide what coder to use? >>> >>> Other ideas... ? >>> >>> Last thing, the link that Luke shared does not seem to work (looks >>> like a googley-friendly URL, here it is the full URL for those >>> interested in the drain/update proposal: >>> >>> [2] >>> https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY/edit# >>> On Fri, Nov 2, 2018 at 10:11 PM Lukasz Cwik <lc...@google.com> wrote: >>>> >>>> I think the idea is that you would use one coder for paths where you don't >>>> need this information and would have FileIO provide a separate path that >>>> uses your updated coder. >>>> Existing users would not be impacted and users of the new FileIO that >>>> depend on this information would not be able to have updated their >>>> pipeline in the first place. >>>> >>>> If the feature in FileIO is experimental, we could choose to break it for >>>> existing users though since I don't know how feasible my suggestion above >>>> is. >>>> >>>> >>>> >>>> On Fri, Nov 2, 2018 at 12:56 PM Jeff Klukas <jklu...@mozilla.com> wrote: >>>>> >>>>> Lukasz - Thanks for those links. That's very helpful context. >>>>> >>>>> It sounds like there's no explicit user contract about evolving Coder >>>>> classes in the Java SDK and users might reasonably assume Coders to be >>>>> stable between SDK versions. Thus, users of the Dataflow or Flink runners >>>>> might reasonably expect that they can update the Java SDK version used in >>>>> their pipeline when performing an update. >>>>> >>>>> Based in that understanding, evolving a class like Metadata might not be >>>>> possible except in a major version bump where it's obvious to users to >>>>> expect breaking changes and not to expect an "update" operation to work. >>>>> >>>>> It's not clear to me what changing the "name" of a coder would look like >>>>> or whether that's a tenable solution here. Would that change be able to >>>>> happen within the SDK itself, or is it something users would need to >>>>> specify? >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com