Ostrzyciel opened a new issue, #2828:
URL: https://github.com/apache/jena/issues/2828

   ### Change
   
   This is a continuation of #2797 – trimming out the bloat in datatyped 
literal validations.
   
   That issue tackled the worst offender – allocating two hashmaps per each 
validated literal. This is however not the whole story. For every literal 
validation we still allocate a new `ValidationState`:
   
   
https://github.com/apache/jena/blob/c7eaf83cf328496cb4f8a24b42a483199ac216c0/jena-core/src/main/java/org/apache/jena/datatypes/xsd/XSDDatatype.java#L267-L271
   
   This makes no sense, because that object is never really mutated (or, to be 
precise, does not need to be mutable). At the same time, it has quite a few 
fields. Although the JVM can probably figure out how to handle this efficiently 
with escape analysis, this is still unnecessary bloat.
   
   The entire `org.apache.jena.ext.xerces` package contains a lot of unused 
code carried over from xerces. A lot of the infrastructure is not needed in 
Jena, because the more complex XML features make no sense in the context of RDF 
literals. For example, a large part of the original job of `ValidationState` 
was to check if ID and IDREF attributes are correct with respect to one 
another. This, along with a few other quirky XML thingies "SHOULD NOT" be used 
in RDF [according to the 
spec](https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes), so I think we can 
safely remove this.
   
   ## The plan
   
   - Remove code for special handling of datatypes: QName, ENTITY, ID, IDREF, 
NOTATION, IDREFS, ENTITIES, NMTOKENS – none of which are valid in RDF.
     - Note that although ID/IDREF validation is implemented in Jena, it 
currently does not work, because the `ValidationState` is allocated once per 
literal, not once per document. And what would a "document" even mean in RDF?
   - Instantiate only one instance of `ValidationState` per Jena instance, or 
something to a similar effect.
   - Remove all kinds of unused code from the xerces package to help 
maintainability, make the JARs a bit smaller etc.
   
   ## How do I do this?
   
   What is the process for making breaking changes to Jena APIs? I assume that 
even if a public class is not used in the Jena codebase, it can be removed only 
in a MAJOR release. So, should I do something like:
   
   - Make a PR deprecating all the different public methods and classes and 
marking them for removal in Jena 6
   - Before Jena 6.0 release (but after last Jena 5.x release) make a PR 
actually removing the code
   
   ?
   
   ## Notes on unused code
   
   - According to my IntelliJ, these methods of `ValidationState` are not used 
anywhere in the Jena codebase: 
     - All `set*` methods, except `setEntityState`, which is used only in 
`ValidationManager`, which is in turn an unused class.
     - `resetIDTables`, `reset`, `useNamespaces`
     - Of course, this needs to be double-checked.
   - Other unused code:
     - `ConfigurableValidationState` class
     - `DatatypeValidator` interface
     - `EntityState` interface – it is only used in unused methods of 
`ValidationState` and `ValidationManager`
     - `SchemaDVFactory` methods: `createTypeRestriction`, `createTypeList`, 
`createTypeUnion` – this refers to some advanced XSD features... I think? 
Anyway, this doesn't seem to be used in RDF.
       - Implementations of these methods in child classes 
(`BaseSchemaDVFactory` and `BaseDVFactory`) are, by extension, also unused.
     - `XSDDatatype`: there is a huge comment block section that probably 
should be removed. The inner static class `XSDGenericType` is also unused.
     - `ValidatedInfo` methods: `stringValue`, `isComparable`, `copyFrom`
     - `NamespaceContext` fields: `XML_URI`, `XMLNS_URI` and methods 
`pushContext`, `popContext`, `declarePrefix`, `getDeclaredPrefixCount`, 
`getDeclaredPrefixAt`
       - Actually, the whole interface is effectively unused... there are no 
classes implementing it. There is some code passing around instances of 
`NamespaceContext`, but this must be always null.
   
   ### Are you interested in contributing a pull request for this task?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to