[ 
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651436#comment-15651436
 ] 

ASF GitHub Bot commented on DRILL-4980:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/644#discussion_r87232002
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
    @@ -59,19 +59,24 @@
        */
       public static final long JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH = 2440588;
       /**
    -   * All old parquet files (which haven't "is.date.correct=true" property 
in metadata) have
    -   * a corrupt date shift: {@value} days or 2 * {@value 
#JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
    +   * All old parquet files (which haven't "is.date.correct=true" or 
"parquet-writer.version" properties
    +   * in metadata) have a corrupt date shift: {@value} days or 2 * {@value 
#JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
        */
       public static final long CORRECT_CORRUPT_DATE_SHIFT = 2 * 
JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH;
    -  // The year 5000 (or 1106685 day from Unix epoch) is chosen as the 
threshold for auto-detecting date corruption.
    -  // This balances two possible cases of bad auto-correction. External 
tools writing dates in the future will not
    -  // be shifted unless they are past this threshold (and we cannot 
identify them as external files based on the metadata).
    -  // On the other hand, historical dates written with Drill wouldn't risk 
being incorrectly shifted unless they were
    -  // something like 10,000 years in the past.
       private static final Chronology UTC = 
org.joda.time.chrono.ISOChronology.getInstanceUTC();
    +  /**
    +   * The year 5000 (or 1106685 day from Unix epoch) is chosen as the 
threshold for auto-detecting date corruption.
    +   * This balances two possible cases of bad auto-correction. External 
tools writing dates in the future will not
    +   * be shifted unless they are past this threshold (and we cannot 
identify them as external files based on the metadata).
    +   * On the other hand, historical dates written with Drill wouldn't risk 
being incorrectly shifted unless they were
    +   * something like 10,000 years in the past.
    +   */
       public static final int DATE_CORRUPTION_THRESHOLD =
           (int) (UTC.getDateTimeMillis(5000, 1, 1, 0) / 
DateTimeConstants.MILLIS_PER_DAY);
    -
    +  /**
    +   * The version of drill parquet writer with date values corruption fix
    --- End diff --
    
    Maybe explain this a bit better. Something like:
    
    Version 2 (and later) of the Drill Parquet writer uses the date format 
described (in the Parquet spec? URL?). Prior versions had dates formatted (copy 
description from above.)


> Upgrading of the approach of parquet date correctness status detection
> ----------------------------------------------------------------------
>
>                 Key: DRILL-4980
>                 URL: https://issues.apache.org/jira/browse/DRILL-4980
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.8.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>             Fix For: 1.9.0
>
>
> This jira is an addition for the 
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be 
> upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to