Re: [jackson-dev] Jackson XML: design/roadmap discussion for XSD-driven binding limitations + potential contributions

Tatu Saloranta Mon, 02 Feb 2026 20:49:02 -0800

On Thu, Jan 29, 2026 at 4:39 AM Simon Cockx <[email protected]>
wrote:


> Thanks for the quick response Tatu! I am delighted that at least it is not
> an immediate "this will not work" conclusion because of fundamental design
> principles.
>

Exactly.


>
>
> I think discussing this here is good -- I will be out until next week now
> but wanted to send a quick response before that.
>
> I appreciate your time - no rush at all.
>
> Out of curiosity, is any work related to these issues already on the
> Jackson roadmap, which we can piggyback off, or is there no concrete work
> planned in the area?
>

Jackson does not really have a concrete/centralized road map as such; at
times I have ideas of the next major thing to tackle.
Although I did add the concept of JSTEPs (see
https://github.com/FasterXML/jackson-future-ideas/wiki/JSTEP) for proposing
bigger sets of related changes which could serves as a sort of roadmap.

Having said that, there is no current plan for specifically addressing
shortcomings of XML backend.

Just to zoom in a bit on (5), because you mention it is probably the
> trickiest, and it might be a good indication of "how far" we can go with
> Jackson. The use case I have described (deserialize two properties with the
> same name with a different order), is actually *not* an important use
> case on its own, but it becomes *much* more relevant in interaction with
> (2) (unwrapping) and (3) (substitution groups). Two use cases I have seen
> while POC-ing support for some real XSD's are described below.
>
> a) Having the same property name on different levels in the Java pojo, but
> because of unwrapping they overlap.
> Example structure taken straight out of a real XSD, but simplified.
> Interpretation: you either have an `issuer` element followed by a single
> `tradeId` element, OR you have a `partyReference` element followed by a
> variable number of `tradeId` elements.
> ```
> <xs:complexType name="Trade">
>   <xs:choice>
>   <xs:sequence>
> <xs:element name="issuer" type="IssuerId"/>
> <xs:element name="tradeId" type="TradeId"/>
> </xs:sequence>
>   <xs:sequence>
>       <xs:element name="partyReference" type="PartyReference"/>
>     <xs:element name="tradeId" type="TradeId" minOccurs="0"
> maxOccurs="unbounded"/>
>   </xs:sequence>
>   </xs:choice>
> </xs:complexType>
> ```
>
> We currently represent this something like the following in Java: (using
> records to concisely show structure - we actually use classes)
> ```
> record Trade(TradeOpt1 opt1, TradeOpt2 opt2) {}
>
> record TradeOpt1(IssuerId issuer, TradeId tradeId) {}
>
> record TradeOpt2(PartyReference partyReference, List<TradeId> tradeIds) {}
> ```
> where we unwrap `TradeOpt1` and `TradeOpt2`. At this point, however, when
> we encounter a `tradeId` element, we somehow need to know whether to set it
> to `TradeOpt1` or to add it to the list of `TradeOpt2`. Right now, BOTH
> happen. (in other situations I have seen one of the two taking precedence,
> depending on the exact unwrapping structure)
>
> b) A substituted name overlaps with an already existing element name on
> the type
> Another example structure based on what I have seen in a real XSD.
> Note that the element called `substituted` can be substituted by an
> element called `foo`.
> ```
> <xs:complexType name="Root">
>   <xs:sequence>
>     <xs:element ref="substituted"/>
>     <xs:element name="inbetween" type="xs:string"/>
>     <xs:element name="foo" type="Foo"/>
>   </xs:sequence>
> </xs:complexType>
>
> <xs:element name="substituted" type="Parent"/>
> <xs:element name="foo" type="Foo" substitutionGroup="substituted"/>
>
> <!-- assume type Foo extends type Parent -->
> ```
> In this scenario, a sample such as
> ```
> <root>
>   <foo></foo>
>   <inbetween>value</inbetween>
>   <foo></foo>
> </root>
> ```
> should be able to decide that the first `foo` element should deserialize
> into the `substituted` property, and the second `foo` element should
> deserialize into the `foo` element, given below structure.
> ```
> record Root(Parent substituted, String inbetween, Foo foo) {}
> ```
>
> Thoughts...
>
> In order to support this, I think it would require work to extend how
> Jackson is able to identify properties. Some ideas:
> - based on element index, although that does not work well if some
> elements are optional, or if some elements can occur multiple times.
> - based on a selector which allows relative matching, e.g., "the element
> that comes after another element", such as XPath
> <https://www.w3schools.com/xml/xpath_syntax.asp>.
> ... or a drastically different approach, e.g., deserializing using
> recursive descent with backtracking, instead of based on property names.
>

Right. None of these sound easily implementable, unfortunately. XPath
approach because of lack of internal model (although parent document
property name path is available at least); property index (optional, only
used for serialization order at database level) is available but
deserialization makes no use of it (I think low-level format codecs like
Protobuf & Avro may use, but it's isolated at streaming API level, not
exposed to databind).


>
> Then there is thinking about how to support this without breaking other
> backends. Again high-level ideas I can think of:
> - making matching on `PropertyName` more generic. E.g., instead of
> fetching a deserializer straight from a map, add a layer of abstraction
> that exposes a method `findMatchingProperty`, which backends can override
> based on their own element identification. The default implementation would
> lookup a property in a map using `PropertyName`.
>

Conceptually reasonable, but details probably get gnarly. Something would
be needed for state-tracking as ValueDeserializers are stateless.


> - entirely skipping the regular Jackson way of building deserializers, and
> creating a custom BeanDeserializer that implements its own lookup system.
>
- entirely skipping the regular Jackson way of building deserializers, and
> creating a custom recursive descent deserializer.
>

Bypassing (Bean)Deserializer(s) might be necessary, but then also adds tons
of work to replace leaf-value (scalar) deserializers (from Numbers to
Date/Times to UUIDs and Base64-encoded binary values) and


> All of them seem like quite a chunk of work, and require careful thought
> about their implications. So: any thoughts on whether this is achievable at
> all? Other ideas?
>

I must admit this sounds like a rather ambitious goal indeed.


>
> I assume use cases (1) - (4) would be less involved than this, but as I
> show in my examples, they will break when they interact with (5), hence why
> I just want to check upfront whether (5) is doable at all.
>

Indeed.

-+ Tatu +-


>
> On Wednesday, 28 January 2026 at 20:12:55 UTC+1 Tatu Saloranta wrote:
>
>> On Wed, Jan 28, 2026 at 8:31 AM Simon Cockx <[email protected]>
>> wrote:
>>
>>> At REGnosys we are running into fundamental limitations of Jackson's
>>> support for XML. I would like to know whether these limitations are
>>> deliberate trade-offs, or changeable design decisions that could be fixed.
>>> Based on that we are considering whether we can either *extend *Jackson
>>> in our codebase, *contribute *to Jackson directly, or *move away* from
>>> Jackson if it doesn't fit at all.
>>>
>>
>> Hi! Yes, this makes sense. I am not sure what the ultimate answer is (it
>> is obviously up to you), but I can try to address more specific
>> questions/concerns.
>>
>>
>>>
>>> First of all: why Jackson?
>>> Saying that we just want to ingest XML based on an XSD is somewhat
>>> hand-wavy - the JAXB project exists exactly for that use case. So maybe the
>>> question is better stated: why not JAXB? In short: the XSD is not our
>>> source of truth, our domain specific language is.
>>>
>>> At REGnosys we maintain the open-source Rune DSL
>>> <https://github.com/finos/rune-dsl>, a language specifically designed
>>> for modelling processes in the financial industry. One important component
>>> of the language is *ingestion*: the process of reading serial data
>>> (JSON, XML, CSV, ...) in various financial standard formats and
>>> representing it in a uniform way in our DSL. Many of these formats are
>>> XML-based and formally defined as multiple XSD files, such as FpML
>>> <https://www.fpml.org/>. To support ingesting of these data standards,
>>> we use the following steps.
>>>
>>>    1. Transform the XSD into Rune types. (similar to how JAXB
>>>    transforms XSD to Java classes)
>>>    2. Annotate the Rune types and fields with additional serialization
>>>    information. (similar to what both Jackson and JAXB do/support)
>>>    3. From this Rune model, generate Java code with custom annotations.
>>>    4. Using a custom Jackson annotation processor, deserialize using a
>>>    Jackson object mapper.
>>>
>>> Note that steps 2 to 4 are independent of the exact serial format: we
>>> don't just support XML, we also support JSON and CSV, and want to stay
>>> extensible for any future formats. That is exactly the attractiveness of
>>> Jackson and where we loose
>>>
>> interest in JAXB: Jackson's design principles align perfectly with this
>>> goal of agnostic deserialisation and serialisation.
>>>
>>
>> Agreed. Thank you for explaining the background -- I think it does align
>> with Jackson goals at high level.
>>
>>
>>>
>>> Issues with Jackson XML
>>> Most of our issues come down to the way bean properties are represented.
>>> Their identity is purely based on the local name of the property being
>>> deserialized, but doesn't take into account surrounding context such as
>>> ordering, namespaces, or representation (e.g., XML attribute versus XML
>>> element).
>>>
>>>
>> Right: XML is probably THE trickiest format for Jackson to support (of
>> ~10 supported formats).
>> And most name mapping being namespace-unaware is problematic, and I'd
>> have guessed number one problem.
>> So as you say, these are known, unsolved problems.
>>
>> In a way you could say Jackson supports XML-specific aspects (namespaces,
>> attribute-vs-element, ordering dependency) on serialization side but not
>> well on deserialization -- on deserialization these aspects are essentially
>> ignored.
>>
>> Examples of problems we run into:
>>>
>>>    1. Having XML elements and XML attributes with the same name is
>>>    unsupported.
>>>    Issue also described here:
>>>    https://stackoverflow.com/q/47199799/3083982
>>>    E.g., <foo id="my-id"><id>MyElementId</id></foo>
>>>    2. The @JsonUnwrapped annotation breaks some XML features.
>>>    Fundamentally this is because it replaces the `FromXMLParser` instance 
>>> with
>>>    a `TokenBuffer`-based parser, which breaks assumptions for some XML 
>>> related
>>>    features. One example is described here:
>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/762
>>>    3. Jackson does not support XSD substitution groups, i.e., having a
>>>    single property with multiple potential names, depending on which a
>>>    specific subtype deserializer is used. Turns out that this is not a
>>>    fundamental issue: we have already extended Jackson to support it in the
>>>    open-source Rune Common <https://github.com/finos/rune-common> project.
>>>    See issue ticket here:
>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/679
>>>    4. Having XML elements with the same local name, but a different
>>>    namespace, is unsupported. See long-standing issue ticket here:
>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/65
>>>    5. Having XML elements with the same local name, but with a
>>>    different order, is unsupported. I don't see a direct issue open for 
>>> this,
>>>    but it is related to this comment:
>>>    
>>> https://github.com/FasterXML/jackson-dataformat-xml/issues/676#issuecomment-2438049500
>>>    E.g., deserializing A1 and A2 to two distinct properties: <foo><a>A1
>>>    </a><b/><a>A2</a></foo>
>>>
>>> While we have ideas of how to approach this, I am definitely not saying
>>> we have a perfect solution in mind yet. We are mostly looking to answer the
>>> question if it is worth looking for a solution in the first place, or if
>>> this is just a fundamental limitation of Jackson.
>>>
>>
>> Of these, (4) could be supported if databind used full `PropertyName`
>> (which has "simple" and "namespace" part), so conceptually that is
>> achievable, but implementation would be quite involved.
>> Ideally there'd be no overhead for other formats, which would probably
>> require more extensibility for XML backend to override handling (lookups).
>>
>> (1) is sort of related but trickier: XML "attributeness" handling is
>> contained with XML components, only used on serialization (I think).
>>
>> (3) would be generally useful and ideally would be implemented -- not
>> sure of all complexities due to "flattening" of layers Jackson otherwise
>> adds. I think it is doable, but like all of these, non trivial.
>>
>> For (2) some support was added to allow format-backends to substitute
>> their own `TokenBuffer` subtypes, but that's as far as that goes. Buffering
>> is also problematic for some @JsonCreator induced buffering wrt
>> `Collection` deserialization.
>>
>> (5) is probably the trickiest. I am not familiar with that yet, would
>> need to dig deeper.
>>
>> Currently there isn't a ton of progress towards any of these (esp. as all
>> are hard problems).
>> But there are no fundamental blockers, I think. This is probably bit
>> awkward wrt defining which path to take.
>> I am happy to try to help in addressing these, for what that is worth.
>>
>>
>>>
>>> I'm happy to discuss here, but if possible, I would also be very happy
>>> to jump on a call sometime to talk through this. Whatever works best.
>>>
>>
>> I think discussing this here is good -- I will be out until next week now
>> but wanted to send a quick response before that.
>>
>> Alternatively Github Discussions on
>> https://github.com/FasterXML/jackson-dataformat-xml/discussions would
>> also work.
>>
>>
>>> Thanks in advance.
>>>
>>>
>> Thank you,
>>
>> -+ Tatu +-
>>
>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "jackson-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion visit
>>> https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "jackson-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com
> <https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/jackson-dev/CAGrxA26sBajWyXBd6zMzhn-uyw3KnmTDqEO6o%3Dyq6gPVKz-QoQ%40mail.gmail.com.

Re: [jackson-dev] Jackson XML: design/roadmap discussion for XSD-driven binding limitations + potential contributions

Reply via email to