Re: [jackson-dev] Jackson XML: design/roadmap discussion for XSD-driven binding limitations + potential contributions

Simon Cockx Tue, 03 Feb 2026 01:56:18 -0800


Bypassing (Bean)Deserializer(s) might be necessary, but then also adds tons 
of work to replace leaf-value (scalar) deserializers (from Numbers to 
Date/Times to UUIDs and Base64-encoded binary values) and


I don't follow this point here. Even if we do bypass the BeanDeserializer, 
I currently assumed we would be able to reuse scalar deserializers such as 
Date values. Why would that not be the case?

Let me try and summarise, and see if you agree.
Suppose we do require ordered deserialization, i.e., use case (5), what I 
think we can reuse:

   1. Annotation mechanism (`AnnotatedProperty`, annotation inheritance, 
   etc) to create a custom new annotation introspector / ordered 
   BeanDeserializer builder.
   2. TBD: scalar deserializers. See question above. Potentially also STD 
   deserializers such as a MapDeserializer.
   3. Jackson's (user) interface: ObjectMapper, ValueDeserializer, etc.

What I think we cannot reuse:

   1. Existing BeanDeserializer(Builder), and its corresponding annotation 
   processor.
   2. Consequently, concrete implementations of structure-based 
   deserialisation code such as unwrapped properties and lists.

... which is most of Jackson.

That almost sounds like we need a full reimplementation of Jackson, where 
we can only reuse the surrounding bits (annotations and interface) + 
potentially scalar deserializers. So to make it less intimidating, let's 
think about this iteratively. What would be the minimal POC that proves 
value in terms of tackling the issues I have described above?
What I think about:

   1. Having a class annotation, e.g., `@Ordered`, to divert from the 
   default BeanDeserializerBuilder and go into a custom one.
   2. Support for scalar deserializers.
   3. Support for having multiple properties with the same XML name, and 
   deserializing into those properties based on the order in which they occur.

.. and ignoring lists/unwrapping/... and other Jackson features for a 
second. I think all of the rest would be extensions on top of this 
structure. Given that this would not really build on top of 
jackson-dataformat-xml, I suppose this would most appropriately live in 
another repository.

I am not sure yet whether this is too ambitious or not. I just want to make 
sure I understand what would be involved, and to see if you, as the Jackson 
expert, agree and/or can guide us towards the most achievable path. :)

Thoughts?

Simon

On Tuesday, 3 February 2026 at 05:48:56 UTC+1 Tatu Saloranta wrote:

> On Thu, Jan 29, 2026 at 4:39 AM Simon Cockx <[email protected]> wrote:
>
>> Thanks for the quick response Tatu! I am delighted that at least it is 
>> not an immediate "this will not work" conclusion because of fundamental 
>> design principles.
>>
>
> Exactly.
>  
>
>>  
>>
>> I think discussing this here is good -- I will be out until next week now 
>> but wanted to send a quick response before that.
>>
>> I appreciate your time - no rush at all.
>>
>> Out of curiosity, is any work related to these issues already on the 
>> Jackson roadmap, which we can piggyback off, or is there no concrete work 
>> planned in the area?
>>
>
> Jackson does not really have a concrete/centralized road map as such; at 
> times I have ideas of the next major thing to tackle.
> Although I did add the concept of JSTEPs (see 
> https://github.com/FasterXML/jackson-future-ideas/wiki/JSTEP) for 
> proposing bigger sets of related changes which could serves as a sort of 
> roadmap.
>
> Having said that, there is no current plan for specifically addressing 
> shortcomings of XML backend.
>
> Just to zoom in a bit on (5), because you mention it is probably the 
>> trickiest, and it might be a good indication of "how far" we can go with 
>> Jackson. The use case I have described (deserialize two properties with the 
>> same name with a different order), is actually *not* an important use 
>> case on its own, but it becomes *much* more relevant in interaction with 
>> (2) (unwrapping) and (3) (substitution groups). Two use cases I have seen 
>> while POC-ing support for some real XSD's are described below.
>>
>> a) Having the same property name on different levels in the Java pojo, 
>> but because of unwrapping they overlap.
>> Example structure taken straight out of a real XSD, but simplified.
>> Interpretation: you either have an `issuer` element followed by a single 
>> `tradeId` element, OR you have a `partyReference` element followed by a 
>> variable number of `tradeId` elements.
>> ```
>> <xs:complexType name="Trade">
>>   <xs:choice>
>>   <xs:sequence>
>> <xs:element name="issuer" type="IssuerId"/>
>> <xs:element name="tradeId" type="TradeId"/>
>> </xs:sequence>
>>   <xs:sequence>
>>       <xs:element name="partyReference" type="PartyReference"/>
>>     <xs:element name="tradeId" type="TradeId" minOccurs="0" 
>> maxOccurs="unbounded"/>
>>   </xs:sequence>
>>   </xs:choice>
>> </xs:complexType>
>> ```
>>
>> We currently represent this something like the following in Java: (using 
>> records to concisely show structure - we actually use classes)
>> ```
>> record Trade(TradeOpt1 opt1, TradeOpt2 opt2) {}
>>
>> record TradeOpt1(IssuerId issuer, TradeId tradeId) {}
>>
>> record TradeOpt2(PartyReference partyReference, List<TradeId> tradeIds) {}
>> ```
>> where we unwrap `TradeOpt1` and `TradeOpt2`. At this point, however, when 
>> we encounter a `tradeId` element, we somehow need to know whether to set it 
>> to `TradeOpt1` or to add it to the list of `TradeOpt2`. Right now, BOTH 
>> happen. (in other situations I have seen one of the two taking precedence, 
>> depending on the exact unwrapping structure)
>>
>> b) A substituted name overlaps with an already existing element name on 
>> the type
>> Another example structure based on what I have seen in a real XSD.
>> Note that the element called `substituted` can be substituted by an 
>> element called `foo`. 
>> ```
>> <xs:complexType name="Root">
>>   <xs:sequence>
>>     <xs:element ref="substituted"/>
>>     <xs:element name="inbetween" type="xs:string"/>
>>     <xs:element name="foo" type="Foo"/>
>>   </xs:sequence>
>> </xs:complexType>
>>
>> <xs:element name="substituted" type="Parent"/>
>> <xs:element name="foo" type="Foo" substitutionGroup="substituted"/>
>>
>> <!-- assume type Foo extends type Parent -->
>> ```
>> In this scenario, a sample such as
>> ```
>> <root>
>>   <foo></foo>
>>   <inbetween>value</inbetween>
>>   <foo></foo>
>> </root>
>> ```
>> should be able to decide that the first `foo` element should deserialize 
>> into the `substituted` property, and the second `foo` element should 
>> deserialize into the `foo` element, given below structure.
>> ```
>> record Root(Parent substituted, String inbetween, Foo foo) {}
>> ```
>>
>> Thoughts...
>>
>> In order to support this, I think it would require work to extend how 
>> Jackson is able to identify properties. Some ideas:
>> - based on element index, although that does not work well if some 
>> elements are optional, or if some elements can occur multiple times.
>> - based on a selector which allows relative matching, e.g., "the element 
>> that comes after another element", such as XPath 
>> <https://www.w3schools.com/xml/xpath_syntax.asp>.
>> ... or a drastically different approach, e.g., deserializing using 
>> recursive descent with backtracking, instead of based on property names.
>>
>
> Right. None of these sound easily implementable, unfortunately. XPath 
> approach because of lack of internal model (although parent document 
> property name path is available at least); property index (optional, only 
> used for serialization order at database level) is available but 
> deserialization makes no use of it (I think low-level format codecs like 
> Protobuf & Avro may use, but it's isolated at streaming API level, not 
> exposed to databind).
>  
>
>>
>> Then there is thinking about how to support this without breaking other 
>> backends. Again high-level ideas I can think of:
>> - making matching on `PropertyName` more generic. E.g., instead of 
>> fetching a deserializer straight from a map, add a layer of abstraction 
>> that exposes a method `findMatchingProperty`, which backends can override 
>> based on their own element identification. The default implementation would 
>> lookup a property in a map using `PropertyName`. 
>>
>
> Conceptually reasonable, but details probably get gnarly. Something would 
> be needed for state-tracking as ValueDeserializers are stateless.
>  
>
>> - entirely skipping the regular Jackson way of building deserializers, 
>> and creating a custom BeanDeserializer that implements its own lookup 
>> system. 
>>
> - entirely skipping the regular Jackson way of building deserializers, and 
>> creating a custom recursive descent deserializer.
>>
>
> Bypassing (Bean)Deserializer(s) might be necessary, but then also adds 
> tons of work to replace leaf-value (scalar) deserializers (from Numbers to 
> Date/Times to UUIDs and Base64-encoded binary values) and 
>  
>
>> All of them seem like quite a chunk of work, and require careful thought 
>> about their implications. So: any thoughts on whether this is achievable at 
>> all? Other ideas?
>>
>
> I must admit this sounds like a rather ambitious goal indeed.
>  
>
>>
>> I assume use cases (1) - (4) would be less involved than this, but as I 
>> show in my examples, they will break when they interact with (5), hence why 
>> I just want to check upfront whether (5) is doable at all.
>>
>
> Indeed.
>  
> -+ Tatu +-
>
>
>>
>> On Wednesday, 28 January 2026 at 20:12:55 UTC+1 Tatu Saloranta wrote:
>>
>>> On Wed, Jan 28, 2026 at 8:31 AM Simon Cockx <[email protected]> 
>>> wrote:
>>>
>>>> At REGnosys we are running into fundamental limitations of Jackson's 
>>>> support for XML. I would like to know whether these limitations are 
>>>> deliberate trade-offs, or changeable design decisions that could be fixed. 
>>>> Based on that we are considering whether we can either *extend *Jackson 
>>>> in our codebase, *contribute *to Jackson directly, or *move away* from 
>>>> Jackson if it doesn't fit at all.
>>>>
>>>
>>> Hi! Yes, this makes sense. I am not sure what the ultimate answer is (it 
>>> is obviously up to you), but I can try to address more specific 
>>> questions/concerns.
>>>  
>>>
>>>>
>>>> First of all: why Jackson?
>>>> Saying that we just want to ingest XML based on an XSD is somewhat 
>>>> hand-wavy - the JAXB project exists exactly for that use case. So maybe 
>>>> the 
>>>> question is better stated: why not JAXB? In short: the XSD is not our 
>>>> source of truth, our domain specific language is.
>>>>
>>>> At REGnosys we maintain the open-source Rune DSL 
>>>> <https://github.com/finos/rune-dsl>, a language specifically designed 
>>>> for modelling processes in the financial industry. One important component 
>>>> of the language is *ingestion*: the process of reading serial data 
>>>> (JSON, XML, CSV, ...) in various financial standard formats and 
>>>> representing it in a uniform way in our DSL. Many of these formats are 
>>>> XML-based and formally defined as multiple XSD files, such as FpML 
>>>> <https://www.fpml.org/>. To support ingesting of these data standards, 
>>>> we use the following steps.
>>>>
>>>>    1. Transform the XSD into Rune types. (similar to how JAXB 
>>>>    transforms XSD to Java classes)
>>>>    2. Annotate the Rune types and fields with additional serialization 
>>>>    information. (similar to what both Jackson and JAXB do/support)
>>>>    3. From this Rune model, generate Java code with custom annotations.
>>>>    4. Using a custom Jackson annotation processor, deserialize using a 
>>>>    Jackson object mapper.
>>>>
>>>> Note that steps 2 to 4 are independent of the exact serial format: we 
>>>> don't just support XML, we also support JSON and CSV, and want to stay 
>>>> extensible for any future formats. That is exactly the attractiveness of 
>>>> Jackson and where we loose 
>>>>
>>> interest in JAXB: Jackson's design principles align perfectly with this 
>>>> goal of agnostic deserialisation and serialisation.
>>>>
>>>
>>> Agreed. Thank you for explaining the background -- I think it does align 
>>> with Jackson goals at high level.
>>>  
>>>
>>>>
>>>> Issues with Jackson XML
>>>> Most of our issues come down to the way bean properties are 
>>>> represented. Their identity is purely based on the local name of the 
>>>> property being deserialized, but doesn't take into account surrounding 
>>>> context such as ordering, namespaces, or representation (e.g., XML 
>>>> attribute versus XML element).
>>>>
>>>>
>>> Right: XML is probably THE trickiest format for Jackson to support (of 
>>> ~10 supported formats).
>>> And most name mapping being namespace-unaware is problematic, and I'd 
>>> have guessed number one problem.
>>> So as you say, these are known, unsolved problems.
>>>  
>>> In a way you could say Jackson supports XML-specific aspects 
>>> (namespaces, attribute-vs-element, ordering dependency) on serialization 
>>> side but not well on deserialization -- on deserialization these aspects 
>>> are essentially ignored.
>>>
>>> Examples of problems we run into:
>>>>
>>>>    1. Having XML elements and XML attributes with the same name is 
>>>>    unsupported.
>>>>    Issue also described here: 
>>>>    https://stackoverflow.com/q/47199799/3083982
>>>>    E.g., <foo id="my-id"><id>MyElementId</id></foo>
>>>>    2. The @JsonUnwrapped annotation breaks some XML features. 
>>>>    Fundamentally this is because it replaces the `FromXMLParser` instance 
>>>> with 
>>>>    a `TokenBuffer`-based parser, which breaks assumptions for some XML 
>>>> related 
>>>>    features. One example is described here: 
>>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/762
>>>>    3. Jackson does not support XSD substitution groups, i.e., having a 
>>>>    single property with multiple potential names, depending on which a 
>>>>    specific subtype deserializer is used. Turns out that this is not a 
>>>>    fundamental issue: we have already extended Jackson to support it in 
>>>> the 
>>>>    open-source Rune Common <https://github.com/finos/rune-common> project. 
>>>>    See issue ticket here: 
>>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/679
>>>>    4. Having XML elements with the same local name, but a different 
>>>>    namespace, is unsupported. See long-standing issue ticket here: 
>>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/65
>>>>    5. Having XML elements with the same local name, but with a 
>>>>    different order, is unsupported. I don't see a direct issue open for 
>>>> this, 
>>>>    but it is related to this comment: 
>>>>    
>>>> https://github.com/FasterXML/jackson-dataformat-xml/issues/676#issuecomment-2438049500
>>>>    E.g., deserializing A1 and A2 to two distinct properties: <foo><a>A1
>>>>    </a><b/><a>A2</a></foo>
>>>>
>>>> While we have ideas of how to approach this, I am definitely not saying 
>>>> we have a perfect solution in mind yet. We are mostly looking to answer 
>>>> the 
>>>> question if it is worth looking for a solution in the first place, or if 
>>>> this is just a fundamental limitation of Jackson.
>>>>
>>>
>>> Of these, (4) could be supported if databind used full `PropertyName` 
>>> (which has "simple" and "namespace" part), so conceptually that is 
>>> achievable, but implementation would be quite involved.
>>> Ideally there'd be no overhead for other formats, which would probably 
>>> require more extensibility for XML backend to override handling (lookups).
>>>
>>> (1) is sort of related but trickier: XML "attributeness" handling is 
>>> contained with XML components, only used on serialization (I think).
>>>
>>> (3) would be generally useful and ideally would be implemented -- not 
>>> sure of all complexities due to "flattening" of layers Jackson otherwise 
>>> adds. I think it is doable, but like all of these, non trivial.
>>>
>>> For (2) some support was added to allow format-backends to substitute 
>>> their own `TokenBuffer` subtypes, but that's as far as that goes. Buffering 
>>> is also problematic for some @JsonCreator induced buffering wrt 
>>> `Collection` deserialization.
>>>
>>> (5) is probably the trickiest. I am not familiar with that yet, would 
>>> need to dig deeper.
>>>
>>> Currently there isn't a ton of progress towards any of these (esp. as 
>>> all are hard problems).
>>> But there are no fundamental blockers, I think. This is probably bit 
>>> awkward wrt defining which path to take.
>>> I am happy to try to help in addressing these, for what that is worth.
>>>  
>>>
>>>>
>>>> I'm happy to discuss here, but if possible, I would also be very happy 
>>>> to jump on a call sometime to talk through this. Whatever works best.
>>>>
>>>
>>> I think discussing this here is good -- I will be out until next week 
>>> now but wanted to send a quick response before that.
>>>  
>>> Alternatively Github Discussions on 
>>> https://github.com/FasterXML/jackson-dataformat-xml/discussions would 
>>> also work.
>>>
>>>
>>>> Thanks in advance.
>>>>
>>>>
>>> Thank you,
>>>
>>> -+ Tatu +-
>>>  
>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "jackson-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion visit 
>>>> https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "jackson-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion visit 
>> https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/jackson-dev/c787422f-5d4f-41be-aece-e7dd23b431bbn%40googlegroups.com.

Re: [jackson-dev] Jackson XML: design/roadmap discussion for XSD-driven binding limitations + potential contributions

Reply via email to