On Tue, Feb 3, 2026 at 1:56 AM Simon Cockx <[email protected]> wrote:
> Bypassing (Bean)Deserializer(s) might be necessary, but then also adds > tons of work to replace leaf-value (scalar) deserializers (from Numbers to > Date/Times to UUIDs and Base64-encoded binary values) and > > I don't follow this point here. Even if we do bypass the BeanDeserializer, > I currently assumed we would be able to reuse scalar deserializers such as > Date values. Why would that not be the case? > You would need to build all the machinery BeanDeserializer uses so I guess it is doable, just major work, rewriting BeanDeserializer. > Let me try and summarise, and see if you agree. > Suppose we do require ordered deserialization, i.e., use case (5), what I > think we can reuse: > > 1. Annotation mechanism (`AnnotatedProperty`, annotation inheritance, > etc) to create a custom new annotation introspector / ordered > BeanDeserializer builder. > 2. TBD: scalar deserializers. See question above. Potentially also STD > deserializers such as a MapDeserializer. > 3. Jackson's (user) interface: ObjectMapper, ValueDeserializer, etc. > > Yes. > What I think we cannot reuse: > > 1. Existing BeanDeserializer(Builder), and its corresponding > annotation processor. > > Note, too, that BeanDeserializer is just part of the puzzle; for property introspection there's POJOPropertiesCollector that might need customization as well. > > 1. Consequently, concrete implementations of structure-based > deserialisation code such as unwrapped properties and lists. > > ... which is most of Jackson. > > That almost sounds like we need a full reimplementation of Jackson, where > we can only reuse the surrounding bits (annotations and interface) + > potentially scalar deserializers. So to make it less intimidating, let's > think about this iteratively. What would be the minimal POC that proves > value in terms of tackling the issues I have described above? > What I think about: > > 1. Having a class annotation, e.g., `@Ordered`, to divert from the > default BeanDeserializerBuilder and go into a custom one > > Such a thing could be added to `AnnotationIntrospector` api and flow through XML-specific one (there's already one to expose "Attribute-ness". > > 1. . > 2. Support for scalar deserializers. > 3. Support for having multiple properties with the same XML name, and > deserializing into those properties based on the order in which they occur. > > .. and ignoring lists/unwrapping/... and other Jackson features for a > second. I think all of the rest would be extensions on top of this > structure. Given that this would not really build on top of > jackson-dataformat-xml, I suppose this would most appropriately live in > another repository. > > I am not sure yet whether this is too ambitious or not. I just want to > make sure I understand what would be involved, and to see if you, as the > Jackson expert, agree and/or can guide us towards the most achievable path. > :) > > Thoughts? > My gut feeling is that if work cannot be contained within `jackson-dataformat-xml`, for the most part -- or at least that being used as the base, with custom handlers registered -- it's probably not worth the effort. -+ Tatu +- > > Simon > > On Tuesday, 3 February 2026 at 05:48:56 UTC+1 Tatu Saloranta wrote: > >> On Thu, Jan 29, 2026 at 4:39 AM Simon Cockx <[email protected]> >> wrote: >> >>> Thanks for the quick response Tatu! I am delighted that at least it is >>> not an immediate "this will not work" conclusion because of fundamental >>> design principles. >>> >> >> Exactly. >> >> >>> >>> >>> I think discussing this here is good -- I will be out until next week >>> now but wanted to send a quick response before that. >>> >>> I appreciate your time - no rush at all. >>> >>> Out of curiosity, is any work related to these issues already on the >>> Jackson roadmap, which we can piggyback off, or is there no concrete work >>> planned in the area? >>> >> >> Jackson does not really have a concrete/centralized road map as such; at >> times I have ideas of the next major thing to tackle. >> Although I did add the concept of JSTEPs (see >> https://github.com/FasterXML/jackson-future-ideas/wiki/JSTEP) for >> proposing bigger sets of related changes which could serves as a sort of >> roadmap. >> >> Having said that, there is no current plan for specifically addressing >> shortcomings of XML backend. >> >> Just to zoom in a bit on (5), because you mention it is probably the >>> trickiest, and it might be a good indication of "how far" we can go with >>> Jackson. The use case I have described (deserialize two properties with the >>> same name with a different order), is actually *not* an important use >>> case on its own, but it becomes *much* more relevant in interaction >>> with (2) (unwrapping) and (3) (substitution groups). Two use cases I have >>> seen while POC-ing support for some real XSD's are described below. >>> >>> a) Having the same property name on different levels in the Java pojo, >>> but because of unwrapping they overlap. >>> Example structure taken straight out of a real XSD, but simplified. >>> Interpretation: you either have an `issuer` element followed by a single >>> `tradeId` element, OR you have a `partyReference` element followed by a >>> variable number of `tradeId` elements. >>> ``` >>> <xs:complexType name="Trade"> >>> <xs:choice> >>> <xs:sequence> >>> <xs:element name="issuer" type="IssuerId"/> >>> <xs:element name="tradeId" type="TradeId"/> >>> </xs:sequence> >>> <xs:sequence> >>> <xs:element name="partyReference" type="PartyReference"/> >>> <xs:element name="tradeId" type="TradeId" minOccurs="0" >>> maxOccurs="unbounded"/> >>> </xs:sequence> >>> </xs:choice> >>> </xs:complexType> >>> ``` >>> >>> We currently represent this something like the following in Java: (using >>> records to concisely show structure - we actually use classes) >>> ``` >>> record Trade(TradeOpt1 opt1, TradeOpt2 opt2) {} >>> >>> record TradeOpt1(IssuerId issuer, TradeId tradeId) {} >>> >>> record TradeOpt2(PartyReference partyReference, List<TradeId> tradeIds) >>> {} >>> ``` >>> where we unwrap `TradeOpt1` and `TradeOpt2`. At this point, however, >>> when we encounter a `tradeId` element, we somehow need to know whether to >>> set it to `TradeOpt1` or to add it to the list of `TradeOpt2`. Right now, >>> BOTH happen. (in other situations I have seen one of the two taking >>> precedence, depending on the exact unwrapping structure) >>> >>> b) A substituted name overlaps with an already existing element name on >>> the type >>> Another example structure based on what I have seen in a real XSD. >>> Note that the element called `substituted` can be substituted by an >>> element called `foo`. >>> ``` >>> <xs:complexType name="Root"> >>> <xs:sequence> >>> <xs:element ref="substituted"/> >>> <xs:element name="inbetween" type="xs:string"/> >>> <xs:element name="foo" type="Foo"/> >>> </xs:sequence> >>> </xs:complexType> >>> >>> <xs:element name="substituted" type="Parent"/> >>> <xs:element name="foo" type="Foo" substitutionGroup="substituted"/> >>> >>> <!-- assume type Foo extends type Parent --> >>> ``` >>> In this scenario, a sample such as >>> ``` >>> <root> >>> <foo></foo> >>> <inbetween>value</inbetween> >>> <foo></foo> >>> </root> >>> ``` >>> should be able to decide that the first `foo` element should deserialize >>> into the `substituted` property, and the second `foo` element should >>> deserialize into the `foo` element, given below structure. >>> ``` >>> record Root(Parent substituted, String inbetween, Foo foo) {} >>> ``` >>> >>> Thoughts... >>> >>> In order to support this, I think it would require work to extend how >>> Jackson is able to identify properties. Some ideas: >>> - based on element index, although that does not work well if some >>> elements are optional, or if some elements can occur multiple times. >>> - based on a selector which allows relative matching, e.g., "the element >>> that comes after another element", such as XPath >>> <https://www.w3schools.com/xml/xpath_syntax.asp>. >>> ... or a drastically different approach, e.g., deserializing using >>> recursive descent with backtracking, instead of based on property names. >>> >> >> Right. None of these sound easily implementable, unfortunately. XPath >> approach because of lack of internal model (although parent document >> property name path is available at least); property index (optional, only >> used for serialization order at database level) is available but >> deserialization makes no use of it (I think low-level format codecs like >> Protobuf & Avro may use, but it's isolated at streaming API level, not >> exposed to databind). >> >> >>> >>> Then there is thinking about how to support this without breaking other >>> backends. Again high-level ideas I can think of: >>> - making matching on `PropertyName` more generic. E.g., instead of >>> fetching a deserializer straight from a map, add a layer of abstraction >>> that exposes a method `findMatchingProperty`, which backends can override >>> based on their own element identification. The default implementation would >>> lookup a property in a map using `PropertyName`. >>> >> >> Conceptually reasonable, but details probably get gnarly. Something would >> be needed for state-tracking as ValueDeserializers are stateless. >> >> >>> - entirely skipping the regular Jackson way of building deserializers, >>> and creating a custom BeanDeserializer that implements its own lookup >>> system. >>> >> - entirely skipping the regular Jackson way of building deserializers, >>> and creating a custom recursive descent deserializer. >>> >> >> Bypassing (Bean)Deserializer(s) might be necessary, but then also adds >> tons of work to replace leaf-value (scalar) deserializers (from Numbers to >> Date/Times to UUIDs and Base64-encoded binary values) and >> >> >>> All of them seem like quite a chunk of work, and require careful thought >>> about their implications. So: any thoughts on whether this is achievable at >>> all? Other ideas? >>> >> >> I must admit this sounds like a rather ambitious goal indeed. >> >> >>> >>> I assume use cases (1) - (4) would be less involved than this, but as I >>> show in my examples, they will break when they interact with (5), hence why >>> I just want to check upfront whether (5) is doable at all. >>> >> >> Indeed. >> >> -+ Tatu +- >> >> >>> >>> On Wednesday, 28 January 2026 at 20:12:55 UTC+1 Tatu Saloranta wrote: >>> >>>> On Wed, Jan 28, 2026 at 8:31 AM Simon Cockx <[email protected]> >>>> wrote: >>>> >>>>> At REGnosys we are running into fundamental limitations of Jackson's >>>>> support for XML. I would like to know whether these limitations are >>>>> deliberate trade-offs, or changeable design decisions that could be fixed. >>>>> Based on that we are considering whether we can either *extend *Jackson >>>>> in our codebase, *contribute *to Jackson directly, or *move away* from >>>>> Jackson if it doesn't fit at all. >>>>> >>>> >>>> Hi! Yes, this makes sense. I am not sure what the ultimate answer is >>>> (it is obviously up to you), but I can try to address more specific >>>> questions/concerns. >>>> >>>> >>>>> >>>>> First of all: why Jackson? >>>>> Saying that we just want to ingest XML based on an XSD is somewhat >>>>> hand-wavy - the JAXB project exists exactly for that use case. So maybe >>>>> the >>>>> question is better stated: why not JAXB? In short: the XSD is not our >>>>> source of truth, our domain specific language is. >>>>> >>>>> At REGnosys we maintain the open-source Rune DSL >>>>> <https://github.com/finos/rune-dsl>, a language specifically designed >>>>> for modelling processes in the financial industry. One important component >>>>> of the language is *ingestion*: the process of reading serial data >>>>> (JSON, XML, CSV, ...) in various financial standard formats and >>>>> representing it in a uniform way in our DSL. Many of these formats are >>>>> XML-based and formally defined as multiple XSD files, such as FpML >>>>> <https://www.fpml.org/>. To support ingesting of these data >>>>> standards, we use the following steps. >>>>> >>>>> 1. Transform the XSD into Rune types. (similar to how JAXB >>>>> transforms XSD to Java classes) >>>>> 2. Annotate the Rune types and fields with additional >>>>> serialization information. (similar to what both Jackson and JAXB >>>>> do/support) >>>>> 3. From this Rune model, generate Java code with custom >>>>> annotations. >>>>> 4. Using a custom Jackson annotation processor, deserialize using >>>>> a Jackson object mapper. >>>>> >>>>> Note that steps 2 to 4 are independent of the exact serial format: we >>>>> don't just support XML, we also support JSON and CSV, and want to stay >>>>> extensible for any future formats. That is exactly the attractiveness of >>>>> Jackson and where we loose >>>>> >>>> interest in JAXB: Jackson's design principles align perfectly with this >>>>> goal of agnostic deserialisation and serialisation. >>>>> >>>> >>>> Agreed. Thank you for explaining the background -- I think it does >>>> align with Jackson goals at high level. >>>> >>>> >>>>> >>>>> Issues with Jackson XML >>>>> Most of our issues come down to the way bean properties are >>>>> represented. Their identity is purely based on the local name of the >>>>> property being deserialized, but doesn't take into account surrounding >>>>> context such as ordering, namespaces, or representation (e.g., XML >>>>> attribute versus XML element). >>>>> >>>>> >>>> Right: XML is probably THE trickiest format for Jackson to support (of >>>> ~10 supported formats). >>>> And most name mapping being namespace-unaware is problematic, and I'd >>>> have guessed number one problem. >>>> So as you say, these are known, unsolved problems. >>>> >>>> In a way you could say Jackson supports XML-specific aspects >>>> (namespaces, attribute-vs-element, ordering dependency) on serialization >>>> side but not well on deserialization -- on deserialization these aspects >>>> are essentially ignored. >>>> >>>> Examples of problems we run into: >>>>> >>>>> 1. Having XML elements and XML attributes with the same name is >>>>> unsupported. >>>>> Issue also described here: >>>>> https://stackoverflow.com/q/47199799/3083982 >>>>> E.g., <foo id="my-id"><id>MyElementId</id></foo> >>>>> 2. The @JsonUnwrapped annotation breaks some XML features. >>>>> Fundamentally this is because it replaces the `FromXMLParser` instance >>>>> with >>>>> a `TokenBuffer`-based parser, which breaks assumptions for some XML >>>>> related >>>>> features. One example is described here: >>>>> https://github.com/FasterXML/jackson-dataformat-xml/issues/762 >>>>> 3. Jackson does not support XSD substitution groups, i.e., having >>>>> a single property with multiple potential names, depending on which a >>>>> specific subtype deserializer is used. Turns out that this is not a >>>>> fundamental issue: we have already extended Jackson to support it in >>>>> the >>>>> open-source Rune Common <https://github.com/finos/rune-common> project. >>>>> See issue ticket here: >>>>> https://github.com/FasterXML/jackson-dataformat-xml/issues/679 >>>>> 4. Having XML elements with the same local name, but a different >>>>> namespace, is unsupported. See long-standing issue ticket here: >>>>> https://github.com/FasterXML/jackson-dataformat-xml/issues/65 >>>>> 5. Having XML elements with the same local name, but with a >>>>> different order, is unsupported. I don't see a direct issue open for >>>>> this, >>>>> but it is related to this comment: >>>>> >>>>> https://github.com/FasterXML/jackson-dataformat-xml/issues/676#issuecomment-2438049500 >>>>> E.g., deserializing A1 and A2 to two distinct properties: <foo><a> >>>>> A1</a><b/><a>A2</a></foo> >>>>> >>>>> While we have ideas of how to approach this, I am definitely not >>>>> saying we have a perfect solution in mind yet. We are mostly looking to >>>>> answer the question if it is worth looking for a solution in the first >>>>> place, or if this is just a fundamental limitation of Jackson. >>>>> >>>> >>>> Of these, (4) could be supported if databind used full `PropertyName` >>>> (which has "simple" and "namespace" part), so conceptually that is >>>> achievable, but implementation would be quite involved. >>>> Ideally there'd be no overhead for other formats, which would probably >>>> require more extensibility for XML backend to override handling (lookups). >>>> >>>> (1) is sort of related but trickier: XML "attributeness" handling is >>>> contained with XML components, only used on serialization (I think). >>>> >>>> (3) would be generally useful and ideally would be implemented -- not >>>> sure of all complexities due to "flattening" of layers Jackson otherwise >>>> adds. I think it is doable, but like all of these, non trivial. >>>> >>>> For (2) some support was added to allow format-backends to substitute >>>> their own `TokenBuffer` subtypes, but that's as far as that goes. Buffering >>>> is also problematic for some @JsonCreator induced buffering wrt >>>> `Collection` deserialization. >>>> >>>> (5) is probably the trickiest. I am not familiar with that yet, would >>>> need to dig deeper. >>>> >>>> Currently there isn't a ton of progress towards any of these (esp. as >>>> all are hard problems). >>>> But there are no fundamental blockers, I think. This is probably bit >>>> awkward wrt defining which path to take. >>>> I am happy to try to help in addressing these, for what that is worth. >>>> >>>> >>>>> >>>>> I'm happy to discuss here, but if possible, I would also be very happy >>>>> to jump on a call sometime to talk through this. Whatever works best. >>>>> >>>> >>>> I think discussing this here is good -- I will be out until next week >>>> now but wanted to send a quick response before that. >>>> >>>> Alternatively Github Discussions on >>>> https://github.com/FasterXML/jackson-dataformat-xml/discussions would >>>> also work. >>>> >>>> >>>>> Thanks in advance. >>>>> >>>>> >>>> Thank you, >>>> >>>> -+ Tatu +- >>>> >>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "jackson-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion visit >>>>> https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "jackson-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >> To view this discussion visit >>> https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com >>> <https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "jackson-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/jackson-dev/c787422f-5d4f-41be-aece-e7dd23b431bbn%40googlegroups.com > <https://groups.google.com/d/msgid/jackson-dev/c787422f-5d4f-41be-aece-e7dd23b431bbn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "jackson-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/jackson-dev/CAGrxA25own9CWr23yGx_MVQX4jhXXqk9SPMWSW%2BDNrFf897XXw%40mail.gmail.com.
