afs commented on issue #2977:
URL: https://github.com/apache/jena/issues/2977#issuecomment-2616377581

   There should be no problem exposing this for Eyeball-NG.
   But it affects the whole system, unless you mean setting true, parsing then 
resetting. When run as a command line tool, that could be acceptable; run as a 
library, then it is not.
   
   The rest of the system assumes language tags are unique.
   
   A better way that should work:
   
   There is a way to tap into exactly what is coming out of a parser before 
`NodeFactory`. 
   
   A parser run has a `FactoryRDF` object. It is settable with 
`RDFParserBuilder factory`. All nodes creation should be via this route.
   
   `ParserProfile` is the interface of events coming out of the parser 
including node creation - it includes line/column in the parser.It calls an  
`FactoryRDF` 
   
   One method of `FactoryRDF` is `createLangLiteral(String lexical, String 
langTag)` so it is seeing the language tag before going to `NodeFactory` that 
canoicalizes it.
   
   By inheriting or wrapping, you could test the language tag, and pass it on 
to usual `FactoryRDF` method having noted any issues.
   
   It is only at NodeFactory.createLiteralLang/createLiteralDirLang that the 
language tag is manipulated. The `equals` and `hashCode` of Node_Literal are 
case sensitive so they don't get in the way.
   
   Constructors for `Node_Literal` are package-scoped to prevent apps creating 
such bad literals.
   
   To get the line/col number needs a `ParserProfile` but it is harder to set a 
custom one (possible, but it may need to be per language).
   
   ```java
      static class FactoryRDF2 extends FactoryRDFStd {
   
           List<String> unwiseLangTags = new ArrayList<>();
   
           @Override
           public Node createLangLiteral(String lexical, String langTag) {
               if ( langTag != null ) {
                   String langTag2 = LangTag.canonical(langTag);
                   if ( ! langTag.equals(langTag2) )
                       unwiseLangTags.add(langTag);
               };
               return super.createLangLiteral(lexical, langTag);
           }
       }
   
       public static void main(String... args) throws IOException {
   
           String PREFIX = "PREFIX : <http://example/>\n";
           String data = PREFIX+"""
                   :s1 :p1 'abc1'@en .
                   :s2 :p2 'abc2'@en-GB .
                   :s2 :p2 'abc3'@en-gb .
                     """;
   
           FactoryRDF2 factory = new FactoryRDF2();
           RDFParser.fromString(data, Lang.TTL).factory(factory).toGraph();
           System.out.println(factory.unwiseLangTags);
       }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to