asereda-gs commented on a change in pull request #1095: [CALCITE-2599] add the 
ASCII function
URL: https://github.com/apache/calcite/pull/1095#discussion_r264004647
 
 

 ##########
 File path: core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java
 ##########
 @@ -237,6 +238,12 @@ public static String initcap(String s) {
     return newS.toString();
   }
 
+  /** SQL ASCII(string) function. */
+  public static int ascii(String s) {
+    return s.length() == 0
+        ? 0 : s.substring(0, 1).getBytes(StandardCharsets.UTF_8)[0];
 
 Review comment:
   I did a little test:
   ```java
     @Test
     public void basic() {
       final char[] chars = {0x00, 0x1B,
           'A', 'a', '0', '{', '}',
           '\n', '\r', '\t', ' ',
           0x80, 0x99,
           '\u0391', '\u03A9', '\u0391'};
   
       for (char ch: chars) {
         final String value = String.valueOf(ch);
         assertEquals(String.format("for %s (ascii:%d hex:0x%02X)", value, 
(int) ch, (int) ch),
             ascii1(value), ascii2(value));
       }
     }
   
     private static int ascii1(String str) {
       return str.isEmpty() ? 0 : str.getBytes(StandardCharsets.UTF_8)[0];
     }
   
     private static int ascii2(String str) {
       return str.isEmpty() ? 0 : str.charAt(0);
     }
   ```
   So it is failing starting with extended ascii characters (> 127)  :  
   ```
   java.lang.AssertionError: for €€ (ascii:128 hex:0x80) 
   Expected :-62
   Actual   :128
   ```
   It seems for standard ascii table (hex: 0x00 - 0x7E) `charAt()` produces 
identical results as UTF8 encoding. It is expected because UNICODE is identical 
to ASCII for first 128 characters (and UTF8 uses single byte for basic ASCII 
charset).
   
   The question is how to define Calcite  `ASCII` function outside basic ASCII 
chars (0-127) ? Or explicitly say behaviour is undefined for non-basic ascii ? 
   
   
[Transact-SQL](https://docs.microsoft.com/en-us/sql/t-sql/functions/ascii-transact-sql?view=sql-server-2017)
 says ASCII works only for [printable 
chars](https://en.wikipedia.org/wiki/ASCII#Printable_characters) (0x20- 0x7E)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to