On Thu, Jan 29, 2026 at 4:39 AM Simon Cockx <[email protected]> wrote:
> Thanks for the quick response Tatu! I am delighted that at least it is not > an immediate "this will not work" conclusion because of fundamental design > principles. > Exactly. > > > I think discussing this here is good -- I will be out until next week now > but wanted to send a quick response before that. > > I appreciate your time - no rush at all. > > Out of curiosity, is any work related to these issues already on the > Jackson roadmap, which we can piggyback off, or is there no concrete work > planned in the area? > Jackson does not really have a concrete/centralized road map as such; at times I have ideas of the next major thing to tackle. Although I did add the concept of JSTEPs (see https://github.com/FasterXML/jackson-future-ideas/wiki/JSTEP) for proposing bigger sets of related changes which could serves as a sort of roadmap. Having said that, there is no current plan for specifically addressing shortcomings of XML backend. Just to zoom in a bit on (5), because you mention it is probably the > trickiest, and it might be a good indication of "how far" we can go with > Jackson. The use case I have described (deserialize two properties with the > same name with a different order), is actually *not* an important use > case on its own, but it becomes *much* more relevant in interaction with > (2) (unwrapping) and (3) (substitution groups). Two use cases I have seen > while POC-ing support for some real XSD's are described below. > > a) Having the same property name on different levels in the Java pojo, but > because of unwrapping they overlap. > Example structure taken straight out of a real XSD, but simplified. > Interpretation: you either have an `issuer` element followed by a single > `tradeId` element, OR you have a `partyReference` element followed by a > variable number of `tradeId` elements. > ``` > <xs:complexType name="Trade"> > <xs:choice> > <xs:sequence> > <xs:element name="issuer" type="IssuerId"/> > <xs:element name="tradeId" type="TradeId"/> > </xs:sequence> > <xs:sequence> > <xs:element name="partyReference" type="PartyReference"/> > <xs:element name="tradeId" type="TradeId" minOccurs="0" > maxOccurs="unbounded"/> > </xs:sequence> > </xs:choice> > </xs:complexType> > ``` > > We currently represent this something like the following in Java: (using > records to concisely show structure - we actually use classes) > ``` > record Trade(TradeOpt1 opt1, TradeOpt2 opt2) {} > > record TradeOpt1(IssuerId issuer, TradeId tradeId) {} > > record TradeOpt2(PartyReference partyReference, List<TradeId> tradeIds) {} > ``` > where we unwrap `TradeOpt1` and `TradeOpt2`. At this point, however, when > we encounter a `tradeId` element, we somehow need to know whether to set it > to `TradeOpt1` or to add it to the list of `TradeOpt2`. Right now, BOTH > happen. (in other situations I have seen one of the two taking precedence, > depending on the exact unwrapping structure) > > b) A substituted name overlaps with an already existing element name on > the type > Another example structure based on what I have seen in a real XSD. > Note that the element called `substituted` can be substituted by an > element called `foo`. > ``` > <xs:complexType name="Root"> > <xs:sequence> > <xs:element ref="substituted"/> > <xs:element name="inbetween" type="xs:string"/> > <xs:element name="foo" type="Foo"/> > </xs:sequence> > </xs:complexType> > > <xs:element name="substituted" type="Parent"/> > <xs:element name="foo" type="Foo" substitutionGroup="substituted"/> > > <!-- assume type Foo extends type Parent --> > ``` > In this scenario, a sample such as > ``` > <root> > <foo></foo> > <inbetween>value</inbetween> > <foo></foo> > </root> > ``` > should be able to decide that the first `foo` element should deserialize > into the `substituted` property, and the second `foo` element should > deserialize into the `foo` element, given below structure. > ``` > record Root(Parent substituted, String inbetween, Foo foo) {} > ``` > > Thoughts... > > In order to support this, I think it would require work to extend how > Jackson is able to identify properties. Some ideas: > - based on element index, although that does not work well if some > elements are optional, or if some elements can occur multiple times. > - based on a selector which allows relative matching, e.g., "the element > that comes after another element", such as XPath > <https://www.w3schools.com/xml/xpath_syntax.asp>. > ... or a drastically different approach, e.g., deserializing using > recursive descent with backtracking, instead of based on property names. > Right. None of these sound easily implementable, unfortunately. XPath approach because of lack of internal model (although parent document property name path is available at least); property index (optional, only used for serialization order at database level) is available but deserialization makes no use of it (I think low-level format codecs like Protobuf & Avro may use, but it's isolated at streaming API level, not exposed to databind). > > Then there is thinking about how to support this without breaking other > backends. Again high-level ideas I can think of: > - making matching on `PropertyName` more generic. E.g., instead of > fetching a deserializer straight from a map, add a layer of abstraction > that exposes a method `findMatchingProperty`, which backends can override > based on their own element identification. The default implementation would > lookup a property in a map using `PropertyName`. > Conceptually reasonable, but details probably get gnarly. Something would be needed for state-tracking as ValueDeserializers are stateless. > - entirely skipping the regular Jackson way of building deserializers, and > creating a custom BeanDeserializer that implements its own lookup system. > - entirely skipping the regular Jackson way of building deserializers, and > creating a custom recursive descent deserializer. > Bypassing (Bean)Deserializer(s) might be necessary, but then also adds tons of work to replace leaf-value (scalar) deserializers (from Numbers to Date/Times to UUIDs and Base64-encoded binary values) and > All of them seem like quite a chunk of work, and require careful thought > about their implications. So: any thoughts on whether this is achievable at > all? Other ideas? > I must admit this sounds like a rather ambitious goal indeed. > > I assume use cases (1) - (4) would be less involved than this, but as I > show in my examples, they will break when they interact with (5), hence why > I just want to check upfront whether (5) is doable at all. > Indeed. -+ Tatu +- > > On Wednesday, 28 January 2026 at 20:12:55 UTC+1 Tatu Saloranta wrote: > >> On Wed, Jan 28, 2026 at 8:31 AM Simon Cockx <[email protected]> >> wrote: >> >>> At REGnosys we are running into fundamental limitations of Jackson's >>> support for XML. I would like to know whether these limitations are >>> deliberate trade-offs, or changeable design decisions that could be fixed. >>> Based on that we are considering whether we can either *extend *Jackson >>> in our codebase, *contribute *to Jackson directly, or *move away* from >>> Jackson if it doesn't fit at all. >>> >> >> Hi! Yes, this makes sense. I am not sure what the ultimate answer is (it >> is obviously up to you), but I can try to address more specific >> questions/concerns. >> >> >>> >>> First of all: why Jackson? >>> Saying that we just want to ingest XML based on an XSD is somewhat >>> hand-wavy - the JAXB project exists exactly for that use case. So maybe the >>> question is better stated: why not JAXB? In short: the XSD is not our >>> source of truth, our domain specific language is. >>> >>> At REGnosys we maintain the open-source Rune DSL >>> <https://github.com/finos/rune-dsl>, a language specifically designed >>> for modelling processes in the financial industry. One important component >>> of the language is *ingestion*: the process of reading serial data >>> (JSON, XML, CSV, ...) in various financial standard formats and >>> representing it in a uniform way in our DSL. Many of these formats are >>> XML-based and formally defined as multiple XSD files, such as FpML >>> <https://www.fpml.org/>. To support ingesting of these data standards, >>> we use the following steps. >>> >>> 1. Transform the XSD into Rune types. (similar to how JAXB >>> transforms XSD to Java classes) >>> 2. Annotate the Rune types and fields with additional serialization >>> information. (similar to what both Jackson and JAXB do/support) >>> 3. From this Rune model, generate Java code with custom annotations. >>> 4. Using a custom Jackson annotation processor, deserialize using a >>> Jackson object mapper. >>> >>> Note that steps 2 to 4 are independent of the exact serial format: we >>> don't just support XML, we also support JSON and CSV, and want to stay >>> extensible for any future formats. That is exactly the attractiveness of >>> Jackson and where we loose >>> >> interest in JAXB: Jackson's design principles align perfectly with this >>> goal of agnostic deserialisation and serialisation. >>> >> >> Agreed. Thank you for explaining the background -- I think it does align >> with Jackson goals at high level. >> >> >>> >>> Issues with Jackson XML >>> Most of our issues come down to the way bean properties are represented. >>> Their identity is purely based on the local name of the property being >>> deserialized, but doesn't take into account surrounding context such as >>> ordering, namespaces, or representation (e.g., XML attribute versus XML >>> element). >>> >>> >> Right: XML is probably THE trickiest format for Jackson to support (of >> ~10 supported formats). >> And most name mapping being namespace-unaware is problematic, and I'd >> have guessed number one problem. >> So as you say, these are known, unsolved problems. >> >> In a way you could say Jackson supports XML-specific aspects (namespaces, >> attribute-vs-element, ordering dependency) on serialization side but not >> well on deserialization -- on deserialization these aspects are essentially >> ignored. >> >> Examples of problems we run into: >>> >>> 1. Having XML elements and XML attributes with the same name is >>> unsupported. >>> Issue also described here: >>> https://stackoverflow.com/q/47199799/3083982 >>> E.g., <foo id="my-id"><id>MyElementId</id></foo> >>> 2. The @JsonUnwrapped annotation breaks some XML features. >>> Fundamentally this is because it replaces the `FromXMLParser` instance >>> with >>> a `TokenBuffer`-based parser, which breaks assumptions for some XML >>> related >>> features. One example is described here: >>> https://github.com/FasterXML/jackson-dataformat-xml/issues/762 >>> 3. Jackson does not support XSD substitution groups, i.e., having a >>> single property with multiple potential names, depending on which a >>> specific subtype deserializer is used. Turns out that this is not a >>> fundamental issue: we have already extended Jackson to support it in the >>> open-source Rune Common <https://github.com/finos/rune-common> project. >>> See issue ticket here: >>> https://github.com/FasterXML/jackson-dataformat-xml/issues/679 >>> 4. Having XML elements with the same local name, but a different >>> namespace, is unsupported. See long-standing issue ticket here: >>> https://github.com/FasterXML/jackson-dataformat-xml/issues/65 >>> 5. Having XML elements with the same local name, but with a >>> different order, is unsupported. I don't see a direct issue open for >>> this, >>> but it is related to this comment: >>> >>> https://github.com/FasterXML/jackson-dataformat-xml/issues/676#issuecomment-2438049500 >>> E.g., deserializing A1 and A2 to two distinct properties: <foo><a>A1 >>> </a><b/><a>A2</a></foo> >>> >>> While we have ideas of how to approach this, I am definitely not saying >>> we have a perfect solution in mind yet. We are mostly looking to answer the >>> question if it is worth looking for a solution in the first place, or if >>> this is just a fundamental limitation of Jackson. >>> >> >> Of these, (4) could be supported if databind used full `PropertyName` >> (which has "simple" and "namespace" part), so conceptually that is >> achievable, but implementation would be quite involved. >> Ideally there'd be no overhead for other formats, which would probably >> require more extensibility for XML backend to override handling (lookups). >> >> (1) is sort of related but trickier: XML "attributeness" handling is >> contained with XML components, only used on serialization (I think). >> >> (3) would be generally useful and ideally would be implemented -- not >> sure of all complexities due to "flattening" of layers Jackson otherwise >> adds. I think it is doable, but like all of these, non trivial. >> >> For (2) some support was added to allow format-backends to substitute >> their own `TokenBuffer` subtypes, but that's as far as that goes. Buffering >> is also problematic for some @JsonCreator induced buffering wrt >> `Collection` deserialization. >> >> (5) is probably the trickiest. I am not familiar with that yet, would >> need to dig deeper. >> >> Currently there isn't a ton of progress towards any of these (esp. as all >> are hard problems). >> But there are no fundamental blockers, I think. This is probably bit >> awkward wrt defining which path to take. >> I am happy to try to help in addressing these, for what that is worth. >> >> >>> >>> I'm happy to discuss here, but if possible, I would also be very happy >>> to jump on a call sometime to talk through this. Whatever works best. >>> >> >> I think discussing this here is good -- I will be out until next week now >> but wanted to send a quick response before that. >> >> Alternatively Github Discussions on >> https://github.com/FasterXML/jackson-dataformat-xml/discussions would >> also work. >> >> >>> Thanks in advance. >>> >>> >> Thank you, >> >> -+ Tatu +- >> >> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "jackson-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion visit >>> https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com >>> <https://groups.google.com/d/msgid/jackson-dev/474eea22-e935-4386-b2f3-1f1adfe65d06n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "jackson-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com > <https://groups.google.com/d/msgid/jackson-dev/f05cfcbf-167b-491d-a834-e5bc5461d714n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "jackson-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/jackson-dev/CAGrxA26sBajWyXBd6zMzhn-uyw3KnmTDqEO6o%3Dyq6gPVKz-QoQ%40mail.gmail.com.
