Ostrzyciel opened a new issue, #2797: URL: https://github.com/apache/jena/issues/2797
### Change While profiling a piece of RDF parser code, I've noticed a whole lot of `HashMap` allocations on the heap – not HashMap entries – HashMaps. What could possibly be allocating tons of maps during RDF parsing? It turns out to be the `ValidationState` class, an instance of which is allocated for each time we parse a datatype literal. It has these two fields: https://github.com/apache/jena/blob/04f377d8aefc0cb8faeb62022d402ef17a6e6e96/jena-core/src/main/java/org/apache/jena/ext/xerces/impl/validation/ValidationState.java#L53-L55 I find the note slightly funny. The whole ID table tables thing is used only by `IDDV` and `IDREFDV` classes. I had to consult the RDF spec, because I didn't even know that such datatypes existed. It this XML thing: https://www.w3.org/TR/xmlschema11-2/#ID I think it's a very rarely used feature of RDF that certainly doesn't warrant allocating two hashmaps per literal. Also: will the ID and IDREF validation even work if we allocate a new object for each literal parse? Then these hashmaps are not carried over... Did this even work? ## Solution I'm submitting a PR that sets these hashmap fields to null by default, and only initializes them if needed (lazy initialization). Note that I'm not convinced that the ID/IDREF feature even works currently, but removing this outright would be a breaking change. ## Broader issue – for a separate thread *(I will create a new issue with this, I'm leaving this here for context for now)* Another thing is that I'm not convinced that we need to allocate a new `ValidationState` every time we parse a datatyped literal: https://github.com/apache/jena/blob/04f377d8aefc0cb8faeb62022d402ef17a6e6e96/jena-core/src/main/java/org/apache/jena/datatypes/xsd/XSDDatatype.java#L267-L272 This is a pretty heavy object to be allocating so often. Many methods in ValidationState are unused, and there is a bunch of other dead code around it copied over from xerces, like the whole `ValidationManager` class. Removing this, however, would be a breaking change, so it's not in the scope of this issue. ### Are you interested in contributing a pull request for this task? Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
