rafsun42 commented on issue #317: URL: https://github.com/apache/age/issues/317#issuecomment-1317743688
To address the issue regarding @TropicalPenguin's first point in the PR, I researched and found a grammar specification at [opencypher.org](https://opencypher.org/resources/). It implements the syntax for labels this way: ``` UnescapedSymbolicName = IdentifierStart, { IdentifierPart } ; (* Based on the unicode identifier and pattern syntax * (http://www.unicode.org/reports/tr31/) * And extended with a few characters. *)IdentifierStart = ID_Start | Pc ; (* Based on the unicode identifier and pattern syntax * (http://www.unicode.org/reports/tr31/) * And extended with a few characters. *)IdentifierPart = ID_Continue | Sc ; ``` Here, ID_Start, ID_Continue, Pc, Sc are unicode character classes whose range you can see [here](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:ID_Start=Yes:]). They seem to cover all unicode alphabets. The [reference page](http://www.unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers) provided in the grammar, describes these classes as: > ID_Start characters are derived from the Unicode General_Category of uppercase letters, lowercase letters, titlecase letters, modifier letters, other letters, letter numbers, plus Other_ID_Start, minus Pattern_Syntax and Pattern_White_Space code points. > > In UnicodeSet notation: > [\p{L}\p{Nl}\p{Other_ID_Start}-\p{Pattern_Syntax}-\p{Pattern_White_Space}] > ID_Continue characters include ID_Start characters, plus characters having the Unicode General_Category of nonspacing marks, spacing combining marks, decimal number, connector punctuation, plus Other_ID_Continue, minus Pattern_Syntax and Pattern_White_Space code points. > > In UnicodeSet notation: > [\p{ID_Start}\p{Mn}\p{Mc}\p{Nd}\p{Pc}\p{Other_ID_Continue}-\p{Pattern_Syntax}-\p{Pattern_White_Space}] I tried using few characters both from the range and outside the range in Neo4J Aura. It seems they are following the above mentioned grammar. Unfortunately, in their documentation they don't mention the ranges explicitly. I also want to mention that for database names they say they allow "Ascii alphabetic" character (does not mention non-english), however they actually allowed "unicode alphabetic" characters in Neo4J Aura. So, I have two questions: 1. Should I implement the grammar I found? 2. Should I allow unicode alphabets in graph name as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
