rafsun42 commented on issue #317:
URL: https://github.com/apache/age/issues/317#issuecomment-1317743688

   To address the issue regarding @TropicalPenguin's first point in the PR, 
   
   I researched and found a grammar specification at 
[opencypher.org](https://opencypher.org/resources/). It implements the syntax 
for labels this way:
   ```
   UnescapedSymbolicName = IdentifierStart, { IdentifierPart } ;
   
   (* Based on the unicode identifier and pattern syntax
    *   (http://www.unicode.org/reports/tr31/)
    * And extended with a few characters.
    *)IdentifierStart = ID_Start
                   | Pc
                   ;
   
   (* Based on the unicode identifier and pattern syntax
    *   (http://www.unicode.org/reports/tr31/)
    * And extended with a few characters.
    *)IdentifierPart = ID_Continue
                  | Sc
                  ;
   ```
   Here, ID_Start, ID_Continue, Pc, Sc are unicode character classes whose 
range you can see 
[here](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:ID_Start=Yes:]).
 They seem to cover all unicode alphabets. The [reference 
page](http://www.unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers)
 provided in the grammar, describes these classes as:
   > ID_Start characters are derived from the Unicode General_Category of 
uppercase letters, lowercase letters, titlecase letters, modifier letters, 
other letters, letter numbers, plus Other_ID_Start, minus Pattern_Syntax and 
Pattern_White_Space code points.
   > 
   > In UnicodeSet notation:
   > [\p{L}\p{Nl}\p{Other_ID_Start}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]
   
   > ID_Continue characters include ID_Start characters, plus characters having 
the Unicode General_Category of nonspacing marks, spacing combining marks, 
decimal number, connector punctuation, plus Other_ID_Continue, minus 
Pattern_Syntax and Pattern_White_Space code points.
   > 
   > In UnicodeSet notation:
   > 
[\p{ID_Start}\p{Mn}\p{Mc}\p{Nd}\p{Pc}\p{Other_ID_Continue}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]
   
   I tried using few characters both from the range and outside the range in 
Neo4J Aura. It seems they are following the above mentioned grammar. 
Unfortunately, in their documentation they don't mention the ranges explicitly. 
   
   I also want to mention that for database names they say they allow "Ascii 
alphabetic" character (does not mention non-english), however they actually 
allowed "unicode alphabetic" characters in Neo4J Aura.  
   
   So, I have two questions: 
   1. Should I implement the grammar I found?
   2. Should I allow unicode alphabets in graph name as well? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to