[
https://issues.apache.org/jira/browse/HIVE-25292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shezm updated HIVE-25292:
-------------------------
Description:
Hei
The to_unix_timestamp function is implemented by GenericUDFToUnixTimeStamp. It
uses SimpleDateFormat to parse the time of the string type.
But SimpleDateFormat does not specify the Locale parameter, that is, the
default locale of the jvm machine will be used. This will cause some
non-English local machines to be unable to run similar sql like :
{code:java}
hive> select to_unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
OK
NULL
hive> select unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
OK
NULL
{code}
At the same time, I found that in spark, to_unix_timestamp & unix_timestamp
also use SimpleDateFormat, and spark uses Locale.US by default, but this will
make it impossible to use local language syntax. For example, in the Chinese
environment, I can parse this result correctly in hive,
{code:java}
hive> select to_unix_timestamp('16/三月/2017:12:25:01', 'dd/MMMM/yyy:HH:mm:ss');
OK
1489638301
Time taken: 0.147 seconds, Fetched: 1 row(s)
OK
{code}
But spark will return Null.
Because English dates are more common dates, I think two SimpleDateFormats are
needed. The new SimpleDateFormat is initialized with the Locale.ENGLISH
parameter.
was:
Hei
The to_unix_timestamp function is implemented by GenericUDFToUnixTimeStamp. It
uses SimpleDateFormat to parse the time of the string type.
But SimpleDateFormat does not specify the Locale parameter, that is, the
default locale of the jvm machine will be used. This will cause some
non-English local machines to be unable to run similar sql like :
{code:java}
hive> select to_unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
OK
NULLhive> select unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
OK
NULL
{code}
At the same time, I found that in spark, to_unix_timestamp & unix_timestamp
also use SimpleDateFormat, and spark uses Locale.US by default, but this will
make it impossible to use local language syntax. For example, in the Chinese
environment, I can parse this result correctly in hive,
{code:java}
hive> select to_unix_timestamp('16/三月/2017:12:25:01', 'dd/MMMM/yyy:HH:mm:ss');
OK
1489638301
Time taken: 0.147 seconds, Fetched: 1 row(s)
OK
NULL
{code}
But spark will return Null.
Because English dates are more common dates, I think two SimpleDateFormats are
needed. The new SimpleDateFormat is initialized with the Locale.ENGLISH
parameter.
> to_unix_timestamp & unix_timestamp should support ENGLISH format by default
> ---------------------------------------------------------------------------
>
> Key: HIVE-25292
> URL: https://issues.apache.org/jira/browse/HIVE-25292
> Project: Hive
> Issue Type: Improvement
> Components: Clients
> Reporter: shezm
> Assignee: shezm
> Priority: Major
> Fix For: 3.2.0
>
>
> Hei
> The to_unix_timestamp function is implemented by GenericUDFToUnixTimeStamp.
> It uses SimpleDateFormat to parse the time of the string type.
> But SimpleDateFormat does not specify the Locale parameter, that is, the
> default locale of the jvm machine will be used. This will cause some
> non-English local machines to be unable to run similar sql like :
>
> {code:java}
> hive> select to_unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
> OK
> NULL
> hive> select unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
> OK
> NULL
> {code}
>
> At the same time, I found that in spark, to_unix_timestamp & unix_timestamp
> also use SimpleDateFormat, and spark uses Locale.US by default, but this will
> make it impossible to use local language syntax. For example, in the Chinese
> environment, I can parse this result correctly in hive,
>
> {code:java}
> hive> select to_unix_timestamp('16/三月/2017:12:25:01', 'dd/MMMM/yyy:HH:mm:ss');
> OK
> 1489638301
> Time taken: 0.147 seconds, Fetched: 1 row(s)
> OK
> {code}
> But spark will return Null.
> Because English dates are more common dates, I think two SimpleDateFormats
> are needed. The new SimpleDateFormat is initialized with the Locale.ENGLISH
> parameter.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)