[
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651438#comment-15651438
]
ASF GitHub Bot commented on DRILL-4980:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/644#discussion_r87232227
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
---
@@ -59,19 +59,24 @@
*/
public static final long JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH = 2440588;
/**
- * All old parquet files (which haven't "is.date.correct=true" property
in metadata) have
- * a corrupt date shift: {@value} days or 2 * {@value
#JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
+ * All old parquet files (which haven't "is.date.correct=true" or
"parquet-writer.version" properties
+ * in metadata) have a corrupt date shift: {@value} days or 2 * {@value
#JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
*/
public static final long CORRECT_CORRUPT_DATE_SHIFT = 2 *
JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH;
- // The year 5000 (or 1106685 day from Unix epoch) is chosen as the
threshold for auto-detecting date corruption.
- // This balances two possible cases of bad auto-correction. External
tools writing dates in the future will not
- // be shifted unless they are past this threshold (and we cannot
identify them as external files based on the metadata).
- // On the other hand, historical dates written with Drill wouldn't risk
being incorrectly shifted unless they were
- // something like 10,000 years in the past.
private static final Chronology UTC =
org.joda.time.chrono.ISOChronology.getInstanceUTC();
+ /**
+ * The year 5000 (or 1106685 day from Unix epoch) is chosen as the
threshold for auto-detecting date corruption.
+ * This balances two possible cases of bad auto-correction. External
tools writing dates in the future will not
+ * be shifted unless they are past this threshold (and we cannot
identify them as external files based on the metadata).
+ * On the other hand, historical dates written with Drill wouldn't risk
being incorrectly shifted unless they were
+ * something like 10,000 years in the past.
+ */
public static final int DATE_CORRUPTION_THRESHOLD =
(int) (UTC.getDateTimeMillis(5000, 1, 1, 0) /
DateTimeConstants.MILLIS_PER_DAY);
-
+ /**
+ * The version of drill parquet writer with date values corruption fix
+ */
+ public static final int DRILL_WRITER_VERSION_WITHOUT_CORRUPTION = 2;
--- End diff --
Maybe call this DRILL_WRITER_VERSION_STD_DATE_FORMAT
The old format was not "corrupted", it just used a date format that was
non-standard.
> Upgrading of the approach of parquet date correctness status detection
> ----------------------------------------------------------------------
>
> Key: DRILL-4980
> URL: https://issues.apache.org/jira/browse/DRILL-4980
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Parquet
> Affects Versions: 1.8.0
> Reporter: Vitalii Diravka
> Assignee: Vitalii Diravka
> Fix For: 1.9.0
>
>
> This jira is an addition for the
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be
> upgraded.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)