[jira] [Updated] (FLINK-15573) Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

Lsw_aka_laplace (Jira) Thu, 16 Jan 2020 23:05:46 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lsw_aka_laplace updated FLINK-15573:
------------------------------------
    Description: 
 UPDATE:

  Flink now uses Calcite for SQL planner, Calcite currently only support 
ISO8859-1 charset and the charset cannot be configured also. But even so, from 
my perspective, we still need to change the 
PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also 
cannot meet. 

  Considering about the implementation, PlannerExpressionParserImpl uses the 
Scala native parser tool, which reads and consumes `scala.Char`(or just regard 
it as java char type). For us, concerning only about char type is enough, which 
means on the implementation, in this case, we don‘t even care about the charset 
problem, leading to *A simple and backwards compatible solution*.

  The implementation almost the same as picture below indicates. Actually I 
have made this change in my company specific branch and deployed it. It works 
well~

 

**************************************************************************************

Now I am talking about the `PlannerExpressionParserImpl`

    For now  the fieldRefrence‘s  charset is JavaIdentifier，why not change it 
to UnicodeIdentifier?

    Currently in my team, we do actually have this problem. For instance, data 
from Es always contains `@timestamp` field , which JavaIdentifier can not meet. 
So what we did is just let the fieldRefrence Charset use Unicode

 
{code:scala}
 lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace 
rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" + 
_ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: 
Char))) ^^ (.mkString) ) 
 lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR 
| ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code}
 

It is simple but really makes sense~

 

mysql supports unicode ,see the picture below , field called `@@`  

!image-2020-01-15-21-49-19-373.png!

Looking forward for any opinion

 

  was:
 UPDATE:

  Flink now uses Calcite for SQL planner, Calcite currently only support 
ISO8859-1 charset and the charset cannot be configured also. But even so, from 
my perspective, we still need to change the 
PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also 
cannot meet. 

  Considering about the implementation, PlannerExpressionParserImpl uses the 
Scala native parser tool, which consumes `scala.Char`(or just regard it as java 
char type). For us, concerning only about char type is enough, which means on 
the implementation, in this case, we don‘t even care about the charset problem, 
leading to *A simple and backwards compatible solution*.

  The implementation almost the same as picture below indicates. Actually I 
have made this change in my company specific branch and deployed it. It works 
well~

 

**************************************************************************************

Now I am talking about the `PlannerExpressionParserImpl`

    For now  the fieldRefrence‘s  charset is JavaIdentifier，why not change it 
to UnicodeIdentifier?

    Currently in my team, we do actually have this problem. For instance, data 
from Es always contains `@timestamp` field , which JavaIdentifier can not meet. 
So what we did is just let the fieldRefrence Charset use Unicode

 
{code:scala}
 lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace 
rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" + 
_ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: 
Char))) ^^ (.mkString) ) 
 lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR 
| ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code}
 

It is simple but really makes sense~

 

mysql supports unicode ,see the picture below , field called `@@`  

!image-2020-01-15-21-49-19-373.png!

Looking forward for any opinion

 


> Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode  as its 
> default charset  
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-15573
>                 URL: https://issues.apache.org/jira/browse/FLINK-15573
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Planner
>            Reporter: Lsw_aka_laplace
>            Priority: Minor
>         Attachments: image-2020-01-15-21-49-19-373.png
>
>
>  UPDATE:
>   Flink now uses Calcite for SQL planner, Calcite currently only support 
> ISO8859-1 charset and the charset cannot be configured also. But even so, 
> from my perspective, we still need to change the 
> PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also 
> cannot meet. 
>   Considering about the implementation, PlannerExpressionParserImpl uses the 
> Scala native parser tool, which reads and consumes `scala.Char`(or just 
> regard it as java char type). For us, concerning only about char type is 
> enough, which means on the implementation, in this case, we don‘t even care 
> about the charset problem, leading to *A simple and backwards compatible 
> solution*.
>   The implementation almost the same as picture below indicates. Actually I 
> have made this change in my company specific branch and deployed it. It works 
> well~
>  
> **************************************************************************************
> Now I am talking about the `PlannerExpressionParserImpl`
>     For now  the fieldRefrence‘s  charset is JavaIdentifier，why not change it 
> to UnicodeIdentifier?
>     Currently in my team, we do actually have this problem. For instance, 
> data from Es always contains `@timestamp` field , which JavaIdentifier can 
> not meet. So what we did is just let the fieldRefrence Charset use Unicode
>  
> {code:scala}
>  lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace 
> rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" 
> + _ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: 
> Char))) ^^ (.mkString) ) 
>  lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = 
> (STAR | ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code}
>  
> It is simple but really makes sense~
>  
> mysql supports unicode ,see the picture below , field called `@@`  
> !image-2020-01-15-21-49-19-373.png!
> Looking forward for any opinion
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-15573) Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

Reply via email to