[GitHub] [flink] dawidwys commented on a change in pull request #9061: [FLINK-13078][table-common] Add a logical type parser

GitBox Wed, 17 Jul 2019 03:31:47 -0700

dawidwys commented on a change in pull request #9061: 
[FLINK-13078][table-common] Add a logical type parser
URL: https://github.com/apache/flink/pull/9061#discussion_r304283500


 ##########
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/types/logical/utils/LogicalTypeParser.java
 ##########
 @@ -0,0 +1,916 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.types.logical.utils;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.api.common.typeutils.TypeSerializerSnapshot;
+import org.apache.flink.core.memory.DataInputDeserializer;
+import org.apache.flink.table.api.ValidationException;
+import org.apache.flink.table.types.logical.AnyType;
+import org.apache.flink.table.types.logical.ArrayType;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.BinaryType;
+import org.apache.flink.table.types.logical.BooleanType;
+import org.apache.flink.table.types.logical.CharType;
+import org.apache.flink.table.types.logical.DateType;
+import org.apache.flink.table.types.logical.DayTimeIntervalType;
+import 
org.apache.flink.table.types.logical.DayTimeIntervalType.DayTimeResolution;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.DoubleType;
+import org.apache.flink.table.types.logical.FloatType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.LocalZonedTimestampType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.MapType;
+import org.apache.flink.table.types.logical.MultisetType;
+import org.apache.flink.table.types.logical.NullType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.types.logical.SmallIntType;
+import org.apache.flink.table.types.logical.TimeType;
+import org.apache.flink.table.types.logical.TimestampType;
+import org.apache.flink.table.types.logical.TinyIntType;
+import org.apache.flink.table.types.logical.UnresolvedUserDefinedType;
+import org.apache.flink.table.types.logical.VarBinaryType;
+import org.apache.flink.table.types.logical.VarCharType;
+import org.apache.flink.table.types.logical.YearMonthIntervalType;
+import 
org.apache.flink.table.types.logical.YearMonthIntervalType.YearMonthResolution;
+import org.apache.flink.table.types.logical.ZonedTimestampType;
+import org.apache.flink.table.utils.EncodingUtils;
+
+import javax.annotation.Nullable;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Parser for creating instances of {@link LogicalType} from a serialized 
string created with
+ * {@link LogicalType#asSerializableString()}.
+ *
+ * <p>In addition to the serializable string representations, this parser also 
supports common
+ * shortcuts for certain types. This includes:
+ * <ul>
+ *     <li>{@code STRING} as a synonym for {@code VARCHAR(INT_MAX)}</li>
+ *     <li>{@code BYTES} as a synonym for {@code VARBINARY(INT_MAX)}</li>
+ *     <li>{@code NUMERIC} and {@code DEC} as synonyms for {@code DECIMAL}</li>
+ *     <li>{@code INTEGER} as a synonym for {@code INT}</li>
+ *     <li>{@code DOUBLE PRECISION} as a synonym for {@code DOUBLE}</li>
+ *     <li>{@code TIME WITHOUT TIME ZONE} as a synonym for {@code TIME}</li>
+ *     <li>{@code TIMESTAMP WITHOUT TIME ZONE} as a synonym for {@code 
TIMESTAMP}</li>
+ *     <li>{@code type ARRAY} as a synonym for {@code ARRAY<type>}</li>
+ *     <li>{@code type MULTISET} as a synonym for {@code MULTISET<type>}</li>
+ *     <li>{@code ROW(...)} as a synonym for {@code ROW<...>}</li>
+ *     <li>{@code type NULL} as a synonym for {@code type}</li>
+ * </ul>
+ *
+ * <p>Furthermore, it returns {@link UnresolvedUserDefinedType} for unknown 
types (partially or fully
+ * qualified such as {@code [catalog].[database].[type]}).
+ */
+@PublicEvolving
+public final class LogicalTypeParser {
+
+       /**
+        * Parses a type string. All types will be fully resolved except for 
{@link UnresolvedUserDefinedType}s.
+        *
+        * <p>Throws {@link ValidationException} in case of parsing errors.
+        *
+        * @param typeString a string like "ROW(field1 INT, field2 BOOLEAN)"
+        * @param classLoader class loader for loading classes of the ANY type
+        */
+       public static LogicalType parse(String typeString, ClassLoader 
classLoader) {
+               final List<Token> tokens = tokenize(typeString);
+               final TokenParser converter = new TokenParser(typeString, 
tokens, classLoader);
+               return converter.parseTokens();
+       }
+
+       /**
+        * Parses a type string. All types will be fully resolved except for 
{@link UnresolvedUserDefinedType}s.
+        *
+        * <p>Throws {@link ValidationException} in case of parsing errors.
+        *
+        * @param typeString a string like "ROW(field1 INT, field2 BOOLEAN)"
+        */
+       public static LogicalType parse(String typeString) {
+               return parse(typeString, 
Thread.currentThread().getContextClassLoader());
+       }
+
+       // 
--------------------------------------------------------------------------------------------
+       // Tokenizer
+       // 
--------------------------------------------------------------------------------------------
+
+       private static final char CHAR_BEGIN_SUBTYPE = '<';
+       private static final char CHAR_END_SUBTYPE = '>';
+       private static final char CHAR_BEGIN_PARAMETER = '(';
+       private static final char CHAR_END_PARAMETER = ')';
+       private static final char CHAR_LIST_SEPARATOR = ',';
+       private static final char CHAR_STRING = '\'';
+       private static final char CHAR_IDENTIFIER = '`';
+       private static final char CHAR_DOT = '.';
+
+       private static boolean isDelimiter(char character) {
+               return Character.isWhitespace(character) ||
+                       character == CHAR_BEGIN_SUBTYPE ||
+                       character == CHAR_END_SUBTYPE ||
+                       character == CHAR_BEGIN_PARAMETER ||
+                       character == CHAR_END_PARAMETER ||
+                       character == CHAR_LIST_SEPARATOR ||
+                       character == CHAR_DOT;
+       }
+
+       private static boolean isDigit(char c) {
+               return c >= '0' && c <= '9';
+       }
+
+       private static List<Token> tokenize(String typeString) {
+               final char[] chars = typeString.toCharArray();
+
+               final List<Token> tokens = new ArrayList<>();
+               final StringBuilder builder = new StringBuilder();
+               for (int cursor = 0; cursor < chars.length; cursor++) {
+                       char curChar = chars[cursor];
+                       switch (curChar) {
+                               case CHAR_BEGIN_SUBTYPE:
+                                       tokens.add(new 
Token(TokenType.BEGIN_SUBTYPE, cursor, Character.toString(CHAR_BEGIN_SUBTYPE)));
+                                       break;
+                               case CHAR_END_SUBTYPE:
+                                       tokens.add(new 
Token(TokenType.END_SUBTYPE, cursor, Character.toString(CHAR_END_SUBTYPE)));
+                                       break;
+                               case CHAR_BEGIN_PARAMETER:
+                                       tokens.add(new 
Token(TokenType.BEGIN_PARAMETER, cursor, 
Character.toString(CHAR_BEGIN_PARAMETER)));
+                                       break;
+                               case CHAR_END_PARAMETER:
+                                       tokens.add(new 
Token(TokenType.END_PARAMETER, cursor, Character.toString(CHAR_END_PARAMETER)));
+                                       break;
+                               case CHAR_LIST_SEPARATOR:
+                                       tokens.add(new 
Token(TokenType.LIST_SEPARATOR, cursor, 
Character.toString(CHAR_LIST_SEPARATOR)));
+                                       break;
+                               case CHAR_DOT:
+                                       tokens.add(new Token(TokenType.DOT, 
cursor, Character.toString(CHAR_DOT)));
+                                       break;
+                               case CHAR_STRING:
+                                       builder.setLength(0);
+                                       cursor = consumeEscaped(builder, chars, 
cursor, CHAR_STRING);
+                                       tokens.add(new 
Token(TokenType.LITERAL_STRING, cursor, builder.toString()));
+                                       break;
+                               case CHAR_IDENTIFIER:
+                                       builder.setLength(0);
+                                       cursor = consumeEscaped(builder, chars, 
cursor, CHAR_IDENTIFIER);
+                                       tokens.add(new 
Token(TokenType.IDENTIFIER, cursor, builder.toString()));
+                                       break;
+                               default:
+                                       if (Character.isWhitespace(curChar)) {
+                                               continue;
+                                       }
+                                       if (isDigit(curChar)) {
+                                               builder.setLength(0);
+                                               cursor = consumeInt(builder, 
chars, cursor);
+                                               tokens.add(new 
Token(TokenType.LITERAL_INT, cursor, builder.toString()));
+                                               break;
+                                       }
+                                       builder.setLength(0);
+                                       cursor = consumeIdentifier(builder, 
chars, cursor);
+                                       final String token = builder.toString();
+                                       final String normalizedToken = 
token.toUpperCase();
+                                       if (KEYWORDS.contains(normalizedToken)) 
{
+                                               tokens.add(new 
Token(TokenType.KEYWORD, cursor, normalizedToken));
+                                       } else {
+                                               tokens.add(new 
Token(TokenType.IDENTIFIER, cursor, token));
+                                       }
+                       }
+               }
+
+               return tokens;
+       }
+
+       private static int consumeEscaped(StringBuilder builder, char[] chars, 
int cursor, char delimiter) {
+               // skip delimiter
+               cursor++;
+               for (; chars.length > cursor; cursor++) {
+                       final char curChar = chars[cursor];
+                       if (curChar == delimiter && cursor + 1 < chars.length 
&& chars[cursor + 1] == delimiter) {
+                               // escaping of the escaping char e.g. "'Hello 
'' World'"
+                               cursor++;
+                               builder.append(curChar);
+                       } else if (curChar == delimiter) {
+                               break;
+                       } else {
+                               builder.append(curChar);
+                       }
+               }
+               return cursor;
+       }
+
+       private static int consumeInt(StringBuilder builder, char[] chars, int 
cursor) {
+               for (; chars.length > cursor && isDigit(chars[cursor]); 
cursor++) {
+                       builder.append(chars[cursor]);
+               }
+               return cursor - 1;
+       }
+
+       private static int consumeIdentifier(StringBuilder builder, char[] 
chars, int cursor) {
+               for (; cursor < chars.length && !isDelimiter(chars[cursor]); 
cursor++) {
+                       builder.append(chars[cursor]);
+               }
+               return cursor - 1;
+       }
+
+       private enum TokenType {
+               // e.g. "ROW<"
+               BEGIN_SUBTYPE,
+
+               // e.g. "ROW<..>"
+               END_SUBTYPE,
+
+               // e.g. "CHAR("
+               BEGIN_PARAMETER,
+
+               // e.g. "CHAR(...)"
+               END_PARAMETER,
+
+               // e.g. "ROW<INT,"
+               LIST_SEPARATOR,
+
+               // e.g. "ROW<name INT 'Comment'"
+               LITERAL_STRING,
+
+               // CHAR(12
+               LITERAL_INT,
+
+               // e.g. "CHAR" or "TO"
+               KEYWORD,
+
+               // e.g. "ROW<name" or "myCatalog.myDatabase"
+               IDENTIFIER,
+
+               // e.g. "myCatalog.myDatabase."
+               DOT
 
 Review comment:
   nit, (also not sure if it would be better), but maybe 
`IDENTIFIER_SEPARATOR`? Similar to `LIST_SEPARATOR`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [flink] dawidwys commented on a change in pull request #9061: [FLINK-13078][table-common] Add a logical type parser

Reply via email to