Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/644#discussion_r86387445
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
    @@ -189,19 +189,23 @@ public static DateCorruptionStatus 
detectCorruptDates(ParquetMetadata footer,
     
         String createdBy = footer.getFileMetaData().getCreatedBy();
         String drillVersion = 
footer.getFileMetaData().getKeyValueMetaData().get(ParquetRecordWriter.DRILL_VERSION_PROPERTY);
    -    String isDateCorrect = 
footer.getFileMetaData().getKeyValueMetaData().get(ParquetRecordWriter.IS_DATE_CORRECT_PROPERTY);
    -    if (drillVersion != null) {
    -      return Boolean.valueOf(isDateCorrect) ? 
DateCorruptionStatus.META_SHOWS_NO_CORRUPTION
    -          : DateCorruptionStatus.META_SHOWS_CORRUPTION;
    -    } else {
    -      // Possibly an old, un-migrated Drill file, check the column 
statistics to see if min/max values look corrupt
    -      // only applies if there is a date column selected
    -      if (createdBy.equals("parquet-mr")) {
    -        // loop through parquet column metadata to find date columns, 
check for corrupt values
    -        return checkForCorruptDateValuesInStatistics(footer, columns, 
autoCorrectCorruptDates);
    +    String writerVersion = 
footer.getFileMetaData().getKeyValueMetaData().get(ParquetRecordWriter.WRITER_VERSION_PROPERTY);
    +    // This flag can be present in parquet files which were generated with 
1.9.0-SNAPSHOT drill version
    +    final String isDateCorrectFlag = "is.date.correct";
    +    String isDateCorrect = 
footer.getFileMetaData().getKeyValueMetaData().get(isDateCorrectFlag);
    +    try {
    +      if (drillVersion != null) {
    +        return (writerVersion != null && Integer.parseInt(writerVersion) 
>= 2) || Boolean.valueOf(isDateCorrect)
    +            || isDrillVersionHasCorrectDates(drillVersion)
    --- End diff --
    
    Isn't the Drill version check redundant? We know that all Drill versions 
from 1.9.0 onwards will have a Drill Parquet writer version in the file. So, we 
only need check the writer version, if we have it. If we don't have it, then 
the only info we have is the Drill version.
    
    We might want to explain this logic in a comment.
    
    The purpose of adding the writer version is that all future format 
decisions can be made on the writer version independent of Drill version. For 
example, suppose we change something in Drill 1.10. Drill 1.10.0-SNAPSHOT will 
start with writer version 2. Later we'll make a change and 1.10.0-SNAPSHOT will 
writer files with writer version 3. The Drill version is ambiguous, the writer 
version is spot on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to