Re: [jackson-dev] Jackson XML: design/roadmap discussion for XSD-driven binding limitations + potential contributions

Tatu Saloranta Wed, 04 Feb 2026 10:30:13 -0800

On Tue, Feb 3, 2026 at 1:56 AM Simon Cockx <[email protected]> wrote:


> Bypassing (Bean)Deserializer(s) might be necessary, but then also adds
> tons of work to replace leaf-value (scalar) deserializers (from Numbers to
> Date/Times to UUIDs and Base64-encoded binary values) and
>
> I don't follow this point here. Even if we do bypass the BeanDeserializer,
> I currently assumed we would be able to reuse scalar deserializers such as
> Date values. Why would that not be the case?
>

You would need to build all the machinery BeanDeserializer uses so I guess
it is doable, just major work, rewriting BeanDeserializer.


> Let me try and summarise, and see if you agree.
> Suppose we do require ordered deserialization, i.e., use case (5), what I
> think we can reuse:
>
>    1. Annotation mechanism (`AnnotatedProperty`, annotation inheritance,
>    etc) to create a custom new annotation introspector / ordered
>    BeanDeserializer builder.
>    2. TBD: scalar deserializers. See question above. Potentially also STD
>    deserializers such as a MapDeserializer.
>    3. Jackson's (user) interface: ObjectMapper, ValueDeserializer, etc.
>
>
Yes.


> What I think we cannot reuse:
>
>    1. Existing BeanDeserializer(Builder), and its corresponding
>    annotation processor.
>
>
Note, too, that BeanDeserializer is just part of the puzzle; for property
introspection there's POJOPropertiesCollector that might need customization
as well.


>
>    1. Consequently, concrete implementations of structure-based
>    deserialisation code such as unwrapped properties and lists.
>
> ... which is most of Jackson.
>
> That almost sounds like we need a full reimplementation of Jackson, where
> we can only reuse the surrounding bits (annotations and interface) +
> potentially scalar deserializers. So to make it less intimidating, let's
> think about this iteratively. What would be the minimal POC that proves
> value in terms of tackling the issues I have described above?
> What I think about:
>
>    1. Having a class annotation, e.g., `@Ordered`, to divert from the
>    default BeanDeserializerBuilder and go into a custom one
>
> Such a thing could be added to `AnnotationIntrospector` api and flow
through XML-specific one (there's already one to expose "Attribute-ness".


>
>    1. .
>    2. Support for scalar deserializers.
>    3. Support for having multiple properties with the same XML name, and
>    deserializing into those properties based on the order in which they occur.
>
> .. and ignoring lists/unwrapping/... and other Jackson features for a
> second. I think all of the rest would be extensions on top of this
> structure. Given that this would not really build on top of
> jackson-dataformat-xml, I suppose this would most appropriately live in
> another repository.
>
> I am not sure yet whether this is too ambitious or not. I just want to
> make sure I understand what would be involved, and to see if you, as the
> Jackson expert, agree and/or can guide us towards the most achievable path.
> :)
>
> Thoughts?
>

My gut feeling is that if work cannot be contained within
`jackson-dataformat-xml`, for the most part -- or at least that being used
as the base, with custom handlers registered -- it's probably not worth the
effort.

-+ Tatu +-


>
> Simon
>
> On Tuesday, 3 February 2026 at 05:48:56 UTC+1 Tatu Saloranta wrote:
>
>> On Thu, Jan 29, 2026 at 4:39 AM Simon Cockx <[email protected]>
>> wrote:
>>
>>> Thanks for the quick response Tatu! I am delighted that at least it is
>>> not an immediate "this will not work" conclusion because of fundamental
>>> design principles.
>>>
>>
>> Exactly.
>>
>>
>>>
>>>
>>> I think discussing this here is good -- I will be out until next week
>>> now but wanted to send a quick response before that.
>>>
>>> I appreciate your time - no rush at all.
>>>
>>> Out of curiosity, is any work related to these issues already on the
>>> Jackson roadmap, which we can piggyback off, or is there no concrete work
>>> planned in the area?
>>>
>>
>> Jackson does not really have a concrete/centralized road map as such; at
>> times I have ideas of the next major thing to tackle.
>> Although I did add the concept of JSTEPs (see
>> https://github.com/FasterXML/jackson-future-ideas/wiki/JSTEP) for
>> proposing bigger sets of related changes which could serves as a sort of
>> roadmap.
>>
>> Having said that, there is no current plan for specifically addressing
>> shortcomings of XML backend.
>>
>> Just to zoom in a bit on (5), because you mention it is probably the
>>> trickiest, and it might be a good indication of "how far" we can go with
>>> Jackson. The use case I have described (deserialize two properties with the
>>> same name with a different order), is actually *not* an important use
>>> case on its own, but it becomes *much* more relevant in interaction
>>> with (2) (unwrapping) and (3) (substitution groups). Two use cases I have
>>> seen while POC-ing support for some real XSD's are described below.
>>>
>>> a) Having the same property name on different levels in the Java pojo,
>>> but because of unwrapping they overlap.
>>> Example structure taken straight out of a real XSD, but simplified.
>>> Interpretation: you either have an `issuer` element followed by a single
>>> `tradeId` element, OR you have a `partyReference` element followed by a
>>> variable number of `tradeId` elements.
>>> ```
>>> <xs:complexType name="Trade">
>>>   <xs:choice>
>>>   <xs:sequence>
>>> <xs:element name="issuer" type="IssuerId"/>
>>> <xs:element name="tradeId" type="TradeId"/>
>>> </xs:sequence>
>>>   <xs:sequence>
>>>       <xs:element name="partyReference" type="PartyReference"/>
>>>     <xs:element name="tradeId" type="TradeId" minOccurs="0"
>>> maxOccurs="unbounded"/>
>>>   </xs:sequence>
>>>   </xs:choice>
>>> </xs:complexType>
>>> ```
>>>
>>> We currently represent this something like the following in Java: (using
>>> records to concisely show structure - we actually use classes)
>>> ```
>>> record Trade(TradeOpt1 opt1, TradeOpt2 opt2) {}
>>>
>>> record TradeOpt1(IssuerId issuer, TradeId tradeId) {}
>>>
>>> record TradeOpt2(PartyReference partyReference, List<TradeId> tradeIds)
>>> {}
>>> ```
>>> where we unwrap `TradeOpt1` and `TradeOpt2`. At this point, however,
>>> when we encounter a `tradeId` element, we somehow need to know whether to
>>> set it to `TradeOpt1` or to add it to the list of `TradeOpt2`. Right now,
>>> BOTH happen. (in other situations I have seen one of the two taking
>>> precedence, depending on the exact unwrapping structure)
>>>
>>> b) A substituted name overlaps with an already existing element name on
>>> the type
>>> Another example structure based on what I have seen in a real XSD.
>>> Note that the element called `substituted` can be substituted by an
>>> element called `foo`.
>>> ```
>>> <xs:complexType name="Root">
>>>   <xs:sequence>
>>>     <xs:element ref="substituted"/>
>>>     <xs:element name="inbetween" type="xs:string"/>
>>>     <xs:element name="foo" type="Foo"/>
>>>   </xs:sequence>
>>> </xs:complexType>
>>>
>>> <xs:element name="substituted" type="Parent"/>
>>> <xs:element name="foo" type="Foo" substitutionGroup="substituted"/>
>>>
>>> <!-- assume type Foo extends type Parent -->
>>> ```
>>> In this scenario, a sample such as
>>> ```
>>> <root>
>>>   <foo></foo>
>>>   <inbetween>value</inbetween>
>>>   <foo></foo>
>>> </root>
>>> ```
>>> should be able to decide that the first `foo` element should deserialize
>>> into the `substituted` property, and the second `foo` element should
>>> deserialize into the `foo` element, given below structure.
>>> ```
>>> record Root(Parent substituted, String inbetween, Foo foo) {}
>>> ```
>>>
>>> Thoughts...
>>>
>>> In order to support this, I think it would require work to extend how
>>> Jackson is able to identify properties. Some ideas:
>>> - based on element index, although that does not work well if some
>>> elements are optional, or if some elements can occur multiple times.
>>> - based on a selector which allows relative matching, e.g., "the element
>>> that comes after another element", such as XPath
>>> <https://www.w3schools.com/xml/xpath_syntax.asp>.
>>> ... or a drastically different approach, e.g., deserializing using
>>> recursive descent with backtracking, instead of based on property names.
>>>
>>
>> Right. None of these sound easily implementable, unfortunately. XPath
>> approach because of lack of internal model (although parent document
>> property name path is available at least); property index (optional, only
>> used for serialization order at database level) is available but
>> deserialization makes no use of it (I think low-level format codecs like
>> Protobuf & Avro may use, but it's isolated at streaming API level, not
>> exposed to databind).
>>
>>
>>>
>>> Then there is thinking about how to support this without breaking other
>>> backends. Again high-level ideas I can think of:
>>> - making matching on `PropertyName` more generic. E.g., instead of
>>> fetching a deserializer straight from a map, add a layer of abstraction
>>> that exposes a method `findMatchingProperty`, which backends can override
>>> based on their own element identification. The default implementation would
>>> lookup a property in a map using `PropertyName`.
>>>
>>
>> Conceptually reasonable, but details probably get gnarly. Something would
>> be needed for state-tracking as ValueDeserializers are stateless.
>>
>>
>>> - entirely skipping the regular Jackson way of building deserializers,
>>> and creating a custom BeanDeserializer that implements its own lookup
>>> system.
>>>
>> - entirely skipping the regular Jackson way of building deserializers,
>>> and creating a custom recursive descent deserializer.
>>>
>>
>> Bypassing (Bean)Deserializer(s) might be necessary, but then also adds
>> tons of work to replace leaf-value (scalar) deserializers (from Numbers to
>> Date/Times to UUIDs and Base64-encoded binary values) and
>>
>>
>>> All of them seem like quite a chunk of work, and require careful thought
>>> about their implications. So: any thoughts on whether this is achievable at
>>> all? Other ideas?
>>>
>>
>> I must admit this sounds like a rather ambitious goal indeed.
>>
>>
>>>
>>> I assume use cases (1) - (4) would be less involved than this, but as I
>>> show in my examples, they will break when they interact with (5), hence why
>>> I just want to check upfront whether (5) is doable at all.
>>>
>>
>> Indeed.
>>
>> -+ Tatu +-
>>
>>
>>>
>>> On Wednesday, 28 January 2026 at 20:12:55 UTC+1 Tatu Saloranta wrote:
>>>
>>>> On Wed, Jan 28, 2026 at 8:31 AM Simon Cockx <[email protected]>
>>>> wrote:
>>>>
>>>>> At REGnosys we are running into fundamental limitations of Jackson's
>>>>> support for XML. I would like to know whether these limitations are
>>>>> deliberate trade-offs, or changeable design decisions that could be fixed.
>>>>> Based on that we are considering whether we can either *extend *Jackson
>>>>> in our codebase, *contribute *to Jackson directly, or *move away* from
>>>>> Jackson if it doesn't fit at all.
>>>>>
>>>>
>>>> Hi! Yes, this makes sense. I am not sure what the ultimate answer is
>>>> (it is obviously up to you), but I can try to address more specific
>>>> questions/concerns.
>>>>
>>>>
>>>>>
>>>>> First of all: why Jackson?
>>>>> Saying that we just want to ingest XML based on an XSD is somewhat
>>>>> hand-wavy - the JAXB project exists exactly for that use case. So maybe 
>>>>> the
>>>>> question is better stated: why not JAXB? In short: the XSD is not our
>>>>> source of truth, our domain specific language is.
>>>>>
>>>>> At REGnosys we maintain the open-source Rune DSL
>>>>> <https://github.com/finos/rune-dsl>, a language specifically designed
>>>>> for modelling processes in the financial industry. One important component
>>>>> of the language is *ingestion*: the process of reading serial data
>>>>> (JSON, XML, CSV, ...) in various financial standard formats and
>>>>> representing it in a uniform way in our DSL. Many of these formats are
>>>>> XML-based and formally defined as multiple XSD files, such as FpML
>>>>> <https://www.fpml.org/>. To support ingesting of these data
>>>>> standards, we use the following steps.
>>>>>
>>>>>    1. Transform the XSD into Rune types. (similar to how JAXB
>>>>>    transforms XSD to Java classes)
>>>>>    2. Annotate the Rune types and fields with additional
>>>>>    serialization information. (similar to what both Jackson and JAXB
>>>>>    do/support)
>>>>>    3. From this Rune model, generate Java code with custom
>>>>>    annotations.
>>>>>    4. Using a custom Jackson annotation processor, deserialize using
>>>>>    a Jackson object mapper.
>>>>>
>>>>> Note that steps 2 to 4 are independent of the exact serial format: we
>>>>> don't just support XML, we also support JSON and CSV, and want to stay
>>>>> extensible for any future formats. That is exactly the attractiveness of
>>>>> Jackson and where we loose
>>>>>
>>>> interest in JAXB: Jackson's design principles align perfectly with this
>>>>> goal of agnostic deserialisation and serialisation.
>>>>>
>>>>
>>>> Agreed. Thank you for explaining the background -- I think it does
>>>> align with Jackson goals at high level.
>>>>
>>>>
>>>>>
>>>>> Issues with Jackson XML
>>>>> Most of our issues come down to the way bean properties are
>>>>> represented. Their identity is purely based on the local name of the
>>>>> property being deserialized, but doesn't take into account surrounding
>>>>> context such as ordering, namespaces, or representation (e.g., XML
>>>>> attribute versus XML element).
>>>>>
>>>>>
>>>> Right: XML is probably THE trickiest format for Jackson to support (of
>>>> ~10 supported formats).
>>>> And most name mapping being namespace-unaware is problematic, and I'd
>>>> have guessed number one problem.
>>>> So as you say, these are known, unsolved problems.
>>>>
>>>> In a way you could say Jackson supports XML-specific aspects
>>>> (namespaces, attribute-vs-element, ordering dependency) on serialization
>>>> side but not well on deserialization -- on deserialization these aspects
>>>> are essentially ignored.
>>>>
>>>> Examples of problems we run into:
>>>>>
>>>>>    1. Having XML elements and XML attributes with the same name is
>>>>>    unsupported.
>>>>>    Issue also described here:
>>>>>    https://stackoverflow.com/q/47199799/3083982
>>>>>    E.g., <foo id="my-id"><id>MyElementId</id></foo>
>>>>>    2. The @JsonUnwrapped annotation breaks some XML features.
>>>>>    Fundamentally this is because it replaces the `FromXMLParser` instance 
>>>>> with
>>>>>    a `TokenBuffer`-based parser, which breaks assumptions for some XML 
>>>>> related
>>>>>    features. One example is described here:
>>>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/762
>>>>>    3. Jackson does not support XSD substitution groups, i.e., having
>>>>>    a single property with multiple potential names, depending on which a
>>>>>    specific subtype deserializer is used. Turns out that this is not a
>>>>>    fundamental issue: we have already extended Jackson to support it in 
>>>>> the
>>>>>    open-source Rune Common <https://github.com/finos/rune-common> project.
>>>>>    See issue ticket here:
>>>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/679
>>>>>    4. Having XML elements with the same local name, but a different
>>>>>    namespace, is unsupported. See long-standing issue ticket here:
>>>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/65
>>>>>    5. Having XML elements with the same local name, but with a
>>>>>    different order, is unsupported. I don't see a direct issue open for 
>>>>> this,
>>>>>    but it is related to this comment:
>>>>>    
>>>>> https://github.com/FasterXML/jackson-dataformat-xml/issues/676#issuecomment-2438049500
>>>>>    E.g., deserializing A1 and A2 to two distinct properties: <foo><a>
>>>>>    A1</a><b/><a>A2</a></foo>
>>>>>
>>>>> While we have ideas of how to approach this, I am definitely not
>>>>> saying we have a perfect solution in mind yet. We are mostly looking to
>>>>> answer the question if it is worth looking for a solution in the first
>>>>> place, or if this is just a fundamental limitation of Jackson.
>>>>>
>>>>
>>>> Of these, (4) could be supported if databind used full `PropertyName`
>>>> (which has "simple" and "namespace" part), so conceptually that is
>>>> achievable, but implementation would be quite involved.
>>>> Ideally there'd be no overhead for other formats, which would probably
>>>> require more extensibility for XML backend to override handling (lookups).
>>>>
>>>> (1) is sort of related but trickier: XML "attributeness" handling is
>>>> contained with XML components, only used on serialization (I think).
>>>>
>>>> (3) would be generally useful and ideally would be implemented -- not
>>>> sure of all complexities due to "flattening" of layers Jackson otherwise
>>>> adds. I think it is doable, but like all of these, non trivial.
>>>>
>>>> For (2) some support was added to allow format-backends to substitute
>>>> their own `TokenBuffer` subtypes, but that's as far as that goes. Buffering
>>>> is also problematic for some @JsonCreator induced buffering wrt
>>>> `Collection` deserialization.
>>>>
>>>> (5) is probably the trickiest. I am not familiar with that yet, would
>>>> need to dig deeper.
>>>>
>>>> Currently there isn't a ton of progress towards any of these (esp. as
>>>> all are hard problems).
>>>> But there are no fundamental blockers, I think. This is probably bit
>>>> awkward wrt defining which path to take.
>>>> I am happy to try to help in addressing these, for what that is worth.
>>>>
>>>>
>>>>>
>>>>> I'm happy to discuss here, but if possible, I would also be very happy
>>>>> to jump on a call sometime to talk through this. Whatever works best.
>>>>>
>>>>
>>>> I think discussing this here is good -- I will be out until next week
>>>> now but wanted to send a quick response before that.
>>>>
>>>> Alternatively Github Discussions on
>>>> https://github.com/FasterXML/jackson-dataformat-xml/discussions would
>>>> also work.
>>>>
>>>>
>>>>> Thanks in advance.
>>>>>
>>>>>
>>>> Thank you,
>>>>
>>>> -+ Tatu +-
>>>>
>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "jackson-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion visit
>>>>> https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "jackson-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>> To view this discussion visit
>>> https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "jackson-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/jackson-dev/c787422f-5d4f-41be-aece-e7dd23b431bbn%40googlegroups.com
> <https://groups.google.com/d/msgid/jackson-dev/c787422f-5d4f-41be-aece-e7dd23b431bbn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/jackson-dev/CAGrxA25own9CWr23yGx_MVQX4jhXXqk9SPMWSW%2BDNrFf897XXw%40mail.gmail.com.

Re: [jackson-dev] Jackson XML: design/roadmap discussion for XSD-driven binding limitations + potential contributions

Reply via email to