[
https://issues.apache.org/jira/browse/FLINK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lsw_aka_laplace updated FLINK-15573:
------------------------------------
Description:
UPDATE:
Flink now uses Calcite for SQL planner, Calcite currently only support
ISO8859-1 charset and the charset cannot be configured also. But even so, from
my perspective, we still need to change the
PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also
cannot meet.
Considering about the implementation, PlannerExpressionParserImpl uses the
Scala native parser tool, which reads and consumes `scala.Char`(or just regard
it as java char type). For us, concerning only about char type is enough, which
means on the implementation, in this case, we don‘t even care about the charset
problem, leading to *A simple and backwards compatible solution*.
The implementation almost the same as picture below indicates. Actually I
have made this change in my company specific branch and deployed it. It works
well~
**************************************************************************************
Now I am talking about the `PlannerExpressionParserImpl`
For now the fieldRefrence‘s charset is JavaIdentifier,why not change it
to UnicodeIdentifier?
Currently in my team, we do actually have this problem. For instance, data
from Es always contains `@timestamp` field , which JavaIdentifier can not meet.
So what we did is just let the fieldRefrence Charset use Unicode
{code:scala}
lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace
rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" +
_ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(:
Char))) ^^ (.mkString) )
lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR
| ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code}
It is simple but really makes sense~
mysql supports unicode ,see the picture below , field called `@@`
!image-2020-01-15-21-49-19-373.png!
Looking forward for any opinion
was:
UPDATE:
Flink now uses Calcite for SQL planner, Calcite currently only support
ISO8859-1 charset and the charset cannot be configured also. But even so, from
my perspective, we still need to change the
PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also
cannot meet.
Considering about the implementation, PlannerExpressionParserImpl uses the
Scala native parser tool, which consumes `scala.Char`(or just regard it as java
char type). For us, concerning only about char type is enough, which means on
the implementation, in this case, we don‘t even care about the charset problem,
leading to *A simple and backwards compatible solution*.
The implementation almost the same as picture below indicates. Actually I
have made this change in my company specific branch and deployed it. It works
well~
**************************************************************************************
Now I am talking about the `PlannerExpressionParserImpl`
For now the fieldRefrence‘s charset is JavaIdentifier,why not change it
to UnicodeIdentifier?
Currently in my team, we do actually have this problem. For instance, data
from Es always contains `@timestamp` field , which JavaIdentifier can not meet.
So what we did is just let the fieldRefrence Charset use Unicode
{code:scala}
lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace
rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" +
_ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(:
Char))) ^^ (.mkString) )
lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR
| ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code}
It is simple but really makes sense~
mysql supports unicode ,see the picture below , field called `@@`
!image-2020-01-15-21-49-19-373.png!
Looking forward for any opinion
> Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its
> default charset
> ---------------------------------------------------------------------------------------------
>
> Key: FLINK-15573
> URL: https://issues.apache.org/jira/browse/FLINK-15573
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Planner
> Reporter: Lsw_aka_laplace
> Priority: Minor
> Attachments: image-2020-01-15-21-49-19-373.png
>
>
> UPDATE:
> Flink now uses Calcite for SQL planner, Calcite currently only support
> ISO8859-1 charset and the charset cannot be configured also. But even so,
> from my perspective, we still need to change the
> PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also
> cannot meet.
> Considering about the implementation, PlannerExpressionParserImpl uses the
> Scala native parser tool, which reads and consumes `scala.Char`(or just
> regard it as java char type). For us, concerning only about char type is
> enough, which means on the implementation, in this case, we don‘t even care
> about the charset problem, leading to *A simple and backwards compatible
> solution*.
> The implementation almost the same as picture below indicates. Actually I
> have made this change in my company specific branch and deployed it. It works
> well~
>
> **************************************************************************************
> Now I am talking about the `PlannerExpressionParserImpl`
> For now the fieldRefrence‘s charset is JavaIdentifier,why not change it
> to UnicodeIdentifier?
> Currently in my team, we do actually have this problem. For instance,
> data from Es always contains `@timestamp` field , which JavaIdentifier can
> not meet. So what we did is just let the fieldRefrence Charset use Unicode
>
> {code:scala}
> lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace
> rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '"
> + _ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(:
> Char))) ^^ (.mkString) )
> lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] =
> (STAR | ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code}
>
> It is simple but really makes sense~
>
> mysql supports unicode ,see the picture below , field called `@@`
> !image-2020-01-15-21-49-19-373.png!
> Looking forward for any opinion
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)