Oh, I am also unsure whether or not we would be able to reuse the 
`FromXMLParser`, or whether we would require a custom Stax-based parser. 
For an MVP though, I guess I would start with the `FromXMLParser` and see 
how far we get.
On Tuesday, 3 February 2026 at 10:56:09 UTC+1 Simon Cockx wrote:

> Bypassing (Bean)Deserializer(s) might be necessary, but then also adds 
> tons of work to replace leaf-value (scalar) deserializers (from Numbers to 
> Date/Times to UUIDs and Base64-encoded binary values) and 
>
> I don't follow this point here. Even if we do bypass the BeanDeserializer, 
> I currently assumed we would be able to reuse scalar deserializers such as 
> Date values. Why would that not be the case?
>
> Let me try and summarise, and see if you agree.
> Suppose we do require ordered deserialization, i.e., use case (5), what I 
> think we can reuse:
>
>    1. Annotation mechanism (`AnnotatedProperty`, annotation inheritance, 
>    etc) to create a custom new annotation introspector / ordered 
>    BeanDeserializer builder.
>    2. TBD: scalar deserializers. See question above. Potentially also STD 
>    deserializers such as a MapDeserializer.
>    3. Jackson's (user) interface: ObjectMapper, ValueDeserializer, etc.
>
> What I think we cannot reuse:
>
>    1. Existing BeanDeserializer(Builder), and its corresponding 
>    annotation processor.
>    2. Consequently, concrete implementations of structure-based 
>    deserialisation code such as unwrapped properties and lists.
>
> ... which is most of Jackson.
>
> That almost sounds like we need a full reimplementation of Jackson, where 
> we can only reuse the surrounding bits (annotations and interface) + 
> potentially scalar deserializers. So to make it less intimidating, let's 
> think about this iteratively. What would be the minimal POC that proves 
> value in terms of tackling the issues I have described above?
> What I think about:
>
>    1. Having a class annotation, e.g., `@Ordered`, to divert from the 
>    default BeanDeserializerBuilder and go into a custom one.
>    2. Support for scalar deserializers.
>    3. Support for having multiple properties with the same XML name, and 
>    deserializing into those properties based on the order in which they occur.
>
> .. and ignoring lists/unwrapping/... and other Jackson features for a 
> second. I think all of the rest would be extensions on top of this 
> structure. Given that this would not really build on top of 
> jackson-dataformat-xml, I suppose this would most appropriately live in 
> another repository.
>
> I am not sure yet whether this is too ambitious or not. I just want to 
> make sure I understand what would be involved, and to see if you, as the 
> Jackson expert, agree and/or can guide us towards the most achievable path. 
> :)
>
> Thoughts?
>
> Simon
>
> On Tuesday, 3 February 2026 at 05:48:56 UTC+1 Tatu Saloranta wrote:
>
>> On Thu, Jan 29, 2026 at 4:39 AM Simon Cockx <[email protected]> 
>> wrote:
>>
>>> Thanks for the quick response Tatu! I am delighted that at least it is 
>>> not an immediate "this will not work" conclusion because of fundamental 
>>> design principles.
>>>
>>
>> Exactly.
>>  
>>
>>>  
>>>
>>> I think discussing this here is good -- I will be out until next week 
>>> now but wanted to send a quick response before that.
>>>
>>> I appreciate your time - no rush at all.
>>>
>>> Out of curiosity, is any work related to these issues already on the 
>>> Jackson roadmap, which we can piggyback off, or is there no concrete work 
>>> planned in the area?
>>>
>>
>> Jackson does not really have a concrete/centralized road map as such; at 
>> times I have ideas of the next major thing to tackle.
>> Although I did add the concept of JSTEPs (see 
>> https://github.com/FasterXML/jackson-future-ideas/wiki/JSTEP) for 
>> proposing bigger sets of related changes which could serves as a sort of 
>> roadmap.
>>
>> Having said that, there is no current plan for specifically addressing 
>> shortcomings of XML backend.
>>
>> Just to zoom in a bit on (5), because you mention it is probably the 
>>> trickiest, and it might be a good indication of "how far" we can go with 
>>> Jackson. The use case I have described (deserialize two properties with the 
>>> same name with a different order), is actually *not* an important use 
>>> case on its own, but it becomes *much* more relevant in interaction 
>>> with (2) (unwrapping) and (3) (substitution groups). Two use cases I have 
>>> seen while POC-ing support for some real XSD's are described below.
>>>
>>> a) Having the same property name on different levels in the Java pojo, 
>>> but because of unwrapping they overlap.
>>> Example structure taken straight out of a real XSD, but simplified.
>>> Interpretation: you either have an `issuer` element followed by a single 
>>> `tradeId` element, OR you have a `partyReference` element followed by a 
>>> variable number of `tradeId` elements.
>>> ```
>>> <xs:complexType name="Trade">
>>>   <xs:choice>
>>>   <xs:sequence>
>>> <xs:element name="issuer" type="IssuerId"/>
>>> <xs:element name="tradeId" type="TradeId"/>
>>> </xs:sequence>
>>>   <xs:sequence>
>>>       <xs:element name="partyReference" type="PartyReference"/>
>>>     <xs:element name="tradeId" type="TradeId" minOccurs="0" 
>>> maxOccurs="unbounded"/>
>>>   </xs:sequence>
>>>   </xs:choice>
>>> </xs:complexType>
>>> ```
>>>
>>> We currently represent this something like the following in Java: (using 
>>> records to concisely show structure - we actually use classes)
>>> ```
>>> record Trade(TradeOpt1 opt1, TradeOpt2 opt2) {}
>>>
>>> record TradeOpt1(IssuerId issuer, TradeId tradeId) {}
>>>
>>> record TradeOpt2(PartyReference partyReference, List<TradeId> tradeIds) 
>>> {}
>>> ```
>>> where we unwrap `TradeOpt1` and `TradeOpt2`. At this point, however, 
>>> when we encounter a `tradeId` element, we somehow need to know whether to 
>>> set it to `TradeOpt1` or to add it to the list of `TradeOpt2`. Right now, 
>>> BOTH happen. (in other situations I have seen one of the two taking 
>>> precedence, depending on the exact unwrapping structure)
>>>
>>> b) A substituted name overlaps with an already existing element name on 
>>> the type
>>> Another example structure based on what I have seen in a real XSD.
>>> Note that the element called `substituted` can be substituted by an 
>>> element called `foo`. 
>>> ```
>>> <xs:complexType name="Root">
>>>   <xs:sequence>
>>>     <xs:element ref="substituted"/>
>>>     <xs:element name="inbetween" type="xs:string"/>
>>>     <xs:element name="foo" type="Foo"/>
>>>   </xs:sequence>
>>> </xs:complexType>
>>>
>>> <xs:element name="substituted" type="Parent"/>
>>> <xs:element name="foo" type="Foo" substitutionGroup="substituted"/>
>>>
>>> <!-- assume type Foo extends type Parent -->
>>> ```
>>> In this scenario, a sample such as
>>> ```
>>> <root>
>>>   <foo></foo>
>>>   <inbetween>value</inbetween>
>>>   <foo></foo>
>>> </root>
>>> ```
>>> should be able to decide that the first `foo` element should deserialize 
>>> into the `substituted` property, and the second `foo` element should 
>>> deserialize into the `foo` element, given below structure.
>>> ```
>>> record Root(Parent substituted, String inbetween, Foo foo) {}
>>> ```
>>>
>>> Thoughts...
>>>
>>> In order to support this, I think it would require work to extend how 
>>> Jackson is able to identify properties. Some ideas:
>>> - based on element index, although that does not work well if some 
>>> elements are optional, or if some elements can occur multiple times.
>>> - based on a selector which allows relative matching, e.g., "the element 
>>> that comes after another element", such as XPath 
>>> <https://www.w3schools.com/xml/xpath_syntax.asp>.
>>> ... or a drastically different approach, e.g., deserializing using 
>>> recursive descent with backtracking, instead of based on property names.
>>>
>>
>> Right. None of these sound easily implementable, unfortunately. XPath 
>> approach because of lack of internal model (although parent document 
>> property name path is available at least); property index (optional, only 
>> used for serialization order at database level) is available but 
>> deserialization makes no use of it (I think low-level format codecs like 
>> Protobuf & Avro may use, but it's isolated at streaming API level, not 
>> exposed to databind).
>>  
>>
>>>
>>> Then there is thinking about how to support this without breaking other 
>>> backends. Again high-level ideas I can think of:
>>> - making matching on `PropertyName` more generic. E.g., instead of 
>>> fetching a deserializer straight from a map, add a layer of abstraction 
>>> that exposes a method `findMatchingProperty`, which backends can override 
>>> based on their own element identification. The default implementation would 
>>> lookup a property in a map using `PropertyName`. 
>>>
>>
>> Conceptually reasonable, but details probably get gnarly. Something would 
>> be needed for state-tracking as ValueDeserializers are stateless.
>>  
>>
>>> - entirely skipping the regular Jackson way of building deserializers, 
>>> and creating a custom BeanDeserializer that implements its own lookup 
>>> system. 
>>>
>> - entirely skipping the regular Jackson way of building deserializers, 
>>> and creating a custom recursive descent deserializer.
>>>
>>
>> Bypassing (Bean)Deserializer(s) might be necessary, but then also adds 
>> tons of work to replace leaf-value (scalar) deserializers (from Numbers to 
>> Date/Times to UUIDs and Base64-encoded binary values) and 
>>  
>>
>>> All of them seem like quite a chunk of work, and require careful thought 
>>> about their implications. So: any thoughts on whether this is achievable at 
>>> all? Other ideas?
>>>
>>
>> I must admit this sounds like a rather ambitious goal indeed.
>>  
>>
>>>
>>> I assume use cases (1) - (4) would be less involved than this, but as I 
>>> show in my examples, they will break when they interact with (5), hence why 
>>> I just want to check upfront whether (5) is doable at all.
>>>
>>
>> Indeed.
>>  
>> -+ Tatu +-
>>
>>
>>>
>>> On Wednesday, 28 January 2026 at 20:12:55 UTC+1 Tatu Saloranta wrote:
>>>
>>>> On Wed, Jan 28, 2026 at 8:31 AM Simon Cockx <[email protected]> 
>>>> wrote:
>>>>
>>>>> At REGnosys we are running into fundamental limitations of Jackson's 
>>>>> support for XML. I would like to know whether these limitations are 
>>>>> deliberate trade-offs, or changeable design decisions that could be 
>>>>> fixed. 
>>>>> Based on that we are considering whether we can either *extend *Jackson 
>>>>> in our codebase, *contribute *to Jackson directly, or *move away* from 
>>>>> Jackson if it doesn't fit at all.
>>>>>
>>>>
>>>> Hi! Yes, this makes sense. I am not sure what the ultimate answer is 
>>>> (it is obviously up to you), but I can try to address more specific 
>>>> questions/concerns.
>>>>  
>>>>
>>>>>
>>>>> First of all: why Jackson?
>>>>> Saying that we just want to ingest XML based on an XSD is somewhat 
>>>>> hand-wavy - the JAXB project exists exactly for that use case. So maybe 
>>>>> the 
>>>>> question is better stated: why not JAXB? In short: the XSD is not our 
>>>>> source of truth, our domain specific language is.
>>>>>
>>>>> At REGnosys we maintain the open-source Rune DSL 
>>>>> <https://github.com/finos/rune-dsl>, a language specifically designed 
>>>>> for modelling processes in the financial industry. One important 
>>>>> component 
>>>>> of the language is *ingestion*: the process of reading serial data 
>>>>> (JSON, XML, CSV, ...) in various financial standard formats and 
>>>>> representing it in a uniform way in our DSL. Many of these formats are 
>>>>> XML-based and formally defined as multiple XSD files, such as FpML 
>>>>> <https://www.fpml.org/>. To support ingesting of these data 
>>>>> standards, we use the following steps.
>>>>>
>>>>>    1. Transform the XSD into Rune types. (similar to how JAXB 
>>>>>    transforms XSD to Java classes)
>>>>>    2. Annotate the Rune types and fields with additional 
>>>>>    serialization information. (similar to what both Jackson and JAXB 
>>>>>    do/support)
>>>>>    3. From this Rune model, generate Java code with custom 
>>>>>    annotations.
>>>>>    4. Using a custom Jackson annotation processor, deserialize using 
>>>>>    a Jackson object mapper.
>>>>>
>>>>> Note that steps 2 to 4 are independent of the exact serial format: we 
>>>>> don't just support XML, we also support JSON and CSV, and want to stay 
>>>>> extensible for any future formats. That is exactly the attractiveness of 
>>>>> Jackson and where we loose 
>>>>>
>>>> interest in JAXB: Jackson's design principles align perfectly with this 
>>>>> goal of agnostic deserialisation and serialisation.
>>>>>
>>>>
>>>> Agreed. Thank you for explaining the background -- I think it does 
>>>> align with Jackson goals at high level.
>>>>  
>>>>
>>>>>
>>>>> Issues with Jackson XML
>>>>> Most of our issues come down to the way bean properties are 
>>>>> represented. Their identity is purely based on the local name of the 
>>>>> property being deserialized, but doesn't take into account surrounding 
>>>>> context such as ordering, namespaces, or representation (e.g., XML 
>>>>> attribute versus XML element).
>>>>>
>>>>>
>>>> Right: XML is probably THE trickiest format for Jackson to support (of 
>>>> ~10 supported formats).
>>>> And most name mapping being namespace-unaware is problematic, and I'd 
>>>> have guessed number one problem.
>>>> So as you say, these are known, unsolved problems.
>>>>  
>>>> In a way you could say Jackson supports XML-specific aspects 
>>>> (namespaces, attribute-vs-element, ordering dependency) on serialization 
>>>> side but not well on deserialization -- on deserialization these aspects 
>>>> are essentially ignored.
>>>>
>>>> Examples of problems we run into:
>>>>>
>>>>>    1. Having XML elements and XML attributes with the same name is 
>>>>>    unsupported.
>>>>>    Issue also described here: 
>>>>>    https://stackoverflow.com/q/47199799/3083982
>>>>>    E.g., <foo id="my-id"><id>MyElementId</id></foo>
>>>>>    2. The @JsonUnwrapped annotation breaks some XML features. 
>>>>>    Fundamentally this is because it replaces the `FromXMLParser` instance 
>>>>> with 
>>>>>    a `TokenBuffer`-based parser, which breaks assumptions for some XML 
>>>>> related 
>>>>>    features. One example is described here: 
>>>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/762
>>>>>    3. Jackson does not support XSD substitution groups, i.e., having 
>>>>>    a single property with multiple potential names, depending on which a 
>>>>>    specific subtype deserializer is used. Turns out that this is not a 
>>>>>    fundamental issue: we have already extended Jackson to support it in 
>>>>> the 
>>>>>    open-source Rune Common <https://github.com/finos/rune-common> 
>>>>> project. 
>>>>>    See issue ticket here: 
>>>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/679
>>>>>    4. Having XML elements with the same local name, but a different 
>>>>>    namespace, is unsupported. See long-standing issue ticket here: 
>>>>>    https://github.com/FasterXML/jackson-dataformat-xml/issues/65
>>>>>    5. Having XML elements with the same local name, but with a 
>>>>>    different order, is unsupported. I don't see a direct issue open for 
>>>>> this, 
>>>>>    but it is related to this comment: 
>>>>>    
>>>>> https://github.com/FasterXML/jackson-dataformat-xml/issues/676#issuecomment-2438049500
>>>>>    E.g., deserializing A1 and A2 to two distinct properties: <foo><a>
>>>>>    A1</a><b/><a>A2</a></foo>
>>>>>
>>>>> While we have ideas of how to approach this, I am definitely not 
>>>>> saying we have a perfect solution in mind yet. We are mostly looking to 
>>>>> answer the question if it is worth looking for a solution in the first 
>>>>> place, or if this is just a fundamental limitation of Jackson.
>>>>>
>>>>
>>>> Of these, (4) could be supported if databind used full `PropertyName` 
>>>> (which has "simple" and "namespace" part), so conceptually that is 
>>>> achievable, but implementation would be quite involved.
>>>> Ideally there'd be no overhead for other formats, which would probably 
>>>> require more extensibility for XML backend to override handling (lookups).
>>>>
>>>> (1) is sort of related but trickier: XML "attributeness" handling is 
>>>> contained with XML components, only used on serialization (I think).
>>>>
>>>> (3) would be generally useful and ideally would be implemented -- not 
>>>> sure of all complexities due to "flattening" of layers Jackson otherwise 
>>>> adds. I think it is doable, but like all of these, non trivial.
>>>>
>>>> For (2) some support was added to allow format-backends to substitute 
>>>> their own `TokenBuffer` subtypes, but that's as far as that goes. 
>>>> Buffering 
>>>> is also problematic for some @JsonCreator induced buffering wrt 
>>>> `Collection` deserialization.
>>>>
>>>> (5) is probably the trickiest. I am not familiar with that yet, would 
>>>> need to dig deeper.
>>>>
>>>> Currently there isn't a ton of progress towards any of these (esp. as 
>>>> all are hard problems).
>>>> But there are no fundamental blockers, I think. This is probably bit 
>>>> awkward wrt defining which path to take.
>>>> I am happy to try to help in addressing these, for what that is worth.
>>>>  
>>>>
>>>>>
>>>>> I'm happy to discuss here, but if possible, I would also be very happy 
>>>>> to jump on a call sometime to talk through this. Whatever works best.
>>>>>
>>>>
>>>> I think discussing this here is good -- I will be out until next week 
>>>> now but wanted to send a quick response before that.
>>>>  
>>>> Alternatively Github Discussions on 
>>>> https://github.com/FasterXML/jackson-dataformat-xml/discussions would 
>>>> also work.
>>>>
>>>>
>>>>> Thanks in advance.
>>>>>
>>>>>
>>>> Thank you,
>>>>
>>>> -+ Tatu +-
>>>>  
>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "jackson-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion visit 
>>>>> https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "jackson-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>>
>> To view this discussion visit 
>>> https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/jackson-dev/5525e919-3c62-4133-b4f0-3af587288c38n%40googlegroups.com.

Reply via email to