[
https://issues.apache.org/jira/browse/FLINK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-15573:
-----------------------------------
Labels: stale-minor (was: )
> Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its
> default charset
> ---------------------------------------------------------------------------------------------
>
> Key: FLINK-15573
> URL: https://issues.apache.org/jira/browse/FLINK-15573
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Planner
> Reporter: Lsw_aka_laplace
> Priority: Minor
> Labels: stale-minor
> Attachments: image-2020-01-15-21-49-19-373.png
>
>
> UPDATE:
> Flink now uses Calcite for SQL planner, Calcite currently only support
> ISO8859-1 charset and the charset cannot be configured also. But even so,
> from my perspective, we still need to change the
> PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also
> cannot meet.
> Considering about the implementation, PlannerExpressionParserImpl uses the
> Scala native parser tool, which reads and consumes `scala.Char`(or just
> regard it as java char type). For us, concerning only about char type is
> enough, which means on the implementation, in this case, we don‘t even care
> about the charset problem, leading to *A simple and backwards compatible
> solution*.
> The implementation almost the same as picture below indicates. Actually I
> have made this change in my company specific branch and deployed it. It works
> well~
>
> **************************************************************************************
> Now I am talking about the `PlannerExpressionParserImpl`
> For now the fieldRefrence‘s charset is JavaIdentifier,why not change it
> to UnicodeIdentifier?
> Currently in my team, we do actually have this problem. For instance,
> data from Es always contains `@timestamp` field , which JavaIdentifier can
> not meet. So what we did is just let the fieldRefrence Charset use Unicode
>
> {code:scala}
> lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace
> rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '"
> + _ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(:
> Char))) ^^ (.mkString) )
> lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] =
> (STAR | ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code}
>
> It is simple but really makes sense~
>
> mysql supports unicode ,see the picture below , field called `@@`
> !image-2020-01-15-21-49-19-373.png!
> Looking forward for any opinion
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)