This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 0383d1e  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form
0383d1e is described below

commit 0383d1efe7a7ada8a202fd411bf32b3ed80c9ce4
Author: Wenchen Fan <wenc...@databricks.com>
AuthorDate: Wed May 27 18:53:19 2020 +0000

    [SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 
bug of stand-alone form
    
    If `LLL`/`qqq` is used in the datetime pattern string, and the current JDK 
in use has a bug for the stand-alone form (see 
https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a 
clear error message.
    
    to keep backward compatibility with Spark 2.4
    
    Yes
    
    Spark 2.4
    ```
    scala> sql("select date_format('1990-1-1', 'LLL')").show
    +---------------------------------------------+
    |date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
    +---------------------------------------------+
    |                                          Jan|
    +---------------------------------------------+
    ```
    
    Spark 3.0 with Java 11
    ```
    scala> sql("select date_format('1990-1-1', 'LLL')").show
    +---------------------------------------------+
    |date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
    +---------------------------------------------+
    |                                          Jan|
    +---------------------------------------------+
    ```
    
    Spark 3.0 with Java 8
    ```
    // before this PR
    +---------------------------------------------+
    |date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
    +---------------------------------------------+
    |                                            1|
    +---------------------------------------------+
    // after this PR
    scala> sql("select date_format('1990-1-1', 'LLL')").show
    org.apache.spark.SparkUpgradeException
    ```
    
    manual test with java 8 and 11
    
    Closes #28646 from cloud-fan/format.
    
    Authored-by: Wenchen Fan <wenc...@databricks.com>
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
---
 docs/sql-ref-datetime-pattern.md                       |  7 ++++---
 .../sql/catalyst/util/DateTimeFormatterHelper.scala    | 18 +++++++++++++++++-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/docs/sql-ref-datetime-pattern.md b/docs/sql-ref-datetime-pattern.md
index 0e00e7b..48e85b4 100644
--- a/docs/sql-ref-datetime-pattern.md
+++ b/docs/sql-ref-datetime-pattern.md
@@ -76,7 +76,8 @@ The count of pattern letters determines the format.
 
 - Year: The count of letters determines the minimum field width below which 
padding is used. If the count of letters is two, then a reduced two digit form 
is used. For printing, this outputs the rightmost two digits. For parsing, this 
will parse using the base value of 2000, resulting in a year within the range 
2000 to 2099 inclusive. If the count of letters is less than four (but not 
two), then the sign is only output for negative years. Otherwise, the sign is 
output if the pad width is [...]
 
-- Month: If the number of pattern letters is 3 or more, the month is 
interpreted as text; otherwise, it is interpreted as a number. The text form is 
depend on letters - 'M' denotes the 'standard' form, and 'L' is for 
'stand-alone' form. The difference between the 'standard' and 'stand-alone' 
forms is trickier to describe as there is no difference in English. However, in 
other languages there is a difference in the word used when the text is used 
alone, as opposed to in a complete date. F [...]
+- Month: It follows the rule of Number/Text. The text form is depend on 
letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. 
These two forms are different only in some certain languages. For example, in 
Russian, 'Июль' is the stand-alone form of July, and 'Июля' is the standard 
form. Here are examples for all supported pattern letters:
+  - `'M'` or `'L'`: Month number in a year starting from 1. There is no 
difference between 'M' and 'L'. Month from 1 to 9 are printed without padding.
     ```sql
     spark-sql> select date_format(date '1970-01-01', "M");
     1
@@ -106,8 +107,8 @@ The count of pattern letters determines the format.
     ```
   - `'MMMM'`: full textual month representation in the standard form. It is 
used for parsing/formatting months as a part of dates/timestamps.
     ```sql
-    spark-sql> select date_format(date '1970-01-01', "MMMM yyyy");
-    January 1970
+    spark-sql> select date_format(date '1970-01-01', "d MMMM");
+    1 January
     spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'd MMMM', 'locale', 'RU'));
     1 января
     ```
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
index 8289568..353c074 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
@@ -217,9 +217,19 @@ private object DateTimeFormatterHelper {
     toFormatter(builder, TimestampFormatter.defaultLocale)
   }
 
+  private final val bugInStandAloneForm = {
+    // Java 8 has a bug for stand-alone form. See 
https://bugs.openjdk.java.net/browse/JDK-8114833
+    // Note: we only check the US locale so that it's a static check. It can 
produce false-negative
+    // as some locales are not affected by the bug. Since `L`/`q` is rarely 
used, we choose to not
+    // complicate the check here.
+    // TODO: remove it when we drop Java 8 support.
+    val formatter = DateTimeFormatter.ofPattern("LLL qqq", Locale.US)
+    formatter.format(LocalDate.of(2000, 1, 1)) == "1 1"
+  }
   final val unsupportedLetters = Set('A', 'c', 'e', 'n', 'N', 'p')
   final val unsupportedNarrowTextStyle =
-    Set("GGGGG", "MMMMM", "LLLLL", "EEEEE", "uuuuu", "QQQQQ", "qqqqq", "uuuuu")
+    Seq("G", "M", "L", "E", "u", "Q", "q").map(_ * 5).toSet
+
   /**
    * In Spark 3.0, we switch to the Proleptic Gregorian calendar and use 
DateTimeFormatter for
    * parsing/formatting datetime values. The pattern string is incompatible 
with the one defined
@@ -243,6 +253,12 @@ private object DateTimeFormatterHelper {
           for (style <- unsupportedNarrowTextStyle if 
patternPart.contains(style)) {
             throw new IllegalArgumentException(s"Too many pattern letters: 
${style.head}")
           }
+          if (bugInStandAloneForm && (patternPart.contains("LLL") || 
patternPart.contains("qqq"))) {
+            throw new IllegalArgumentException("Java 8 has a bug to support 
stand-alone " +
+              "form (3 or more 'L' or 'q' in the pattern string). Please use 
'M' or 'Q' instead, " +
+              "or upgrade your Java version. For more details, please read " +
+              "https://bugs.openjdk.java.net/browse/JDK-8114833";)
+          }
           // The meaning of 'u' was day number of week in SimpleDateFormat, it 
was changed to year
           // in DateTimeFormatter. Substitute 'u' to 'e' and use 
DateTimeFormatter to parse the
           // string. If parsable, return the result; otherwise, fall back to 
'u', and then use the


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to