Andy Seaborne created JENA-1285:
-----------------------------------

             Summary: Have on Tokenizer token for strings.
                 Key: JENA-1285
                 URL: https://issues.apache.org/jira/browse/JENA-1285
             Project: Apache Jena
          Issue Type: Improvement
          Components: RIOT
            Reporter: Andy Seaborne
            Assignee: Andy Seaborne
            Priority: Minor


The Tokenizer ({{TokenizerText}}) faithfully records what sort of string it has 
processed using different token types - STRING1, STRING2, LONG_STRING1, 
LONG_STRING2.

Sometimes it matters (N-Triples), sometimes it doesn't (Turtle).

[Turtle rule for 
strings|https://www.w3.org/TR/turtle/#grammar-production-String]

[N-Triples rule for 
strings|https://www.w3.org/TR/n-triples/#grammar-production-STRING_LITERAL_QUOTE]

Instead of 4 tokens, (5 if you include the existing STRING token) it is 
proposed to use one token type STRING and record the actual string type seen 
separately.

This is make working with non-text formats simpler where there are strings 
without the concept of quotes, and any format that works with any string form.

The specific cases (e.g. N-Triples) can still test for the details of the 
string syntax seen but the token type is the conceptual "superclass" STRING 
type.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to