[GitHub] [carbondata] akashrn5 commented on a change in pull request #4177: [CARBONDATA-4247][CARBONDATA-4241] Fix Wrong timestamp value query results for data before 1900 years with Spark 3.1

GitBox Tue, 27 Jul 2021 07:42:03 -0700


akashrn5 commented on a change in pull request #4177:
URL: https://github.com/apache/carbondata/pull/4177#discussion_r677322219




##########
File path: 
integration/presto/src/main/java/org/apache/carbondata/presto/ColumnarVectorWrapperDirect.java
##########
@@ -219,6 +219,10 @@ public void putComplexObject(List<Integer> offsetVector) {
     columnVector.putComplexObject(offsetVector);
   }
 
+  @Override
+  public void setCarbonDataFileWrittenVersion(String 
carbonDataFileWrittenVersion) {
+  }

Review comment:
       same as above

##########
File path: 
core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/directread/ColumnarVectorWrapperDirectWithInvertedIndex.java
##########
@@ -104,6 +104,10 @@ public void putNotNull(int rowId) {
     // nothing to do
   }
 
+  @Override
+  public void setCarbonDataFileWrittenVersion(String 
carbonDataFileWrittenVersion) {
+  }

Review comment:
       same as above comment

##########
File path: 
core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
##########
@@ -500,6 +500,10 @@ public void putArray(int rowId, int offset, int length) {
     lengths[rowId] = length;
   }
 
+  @Override
+  public void setCarbonDataFileWrittenVersion(String 
carbonDataFileWrittenVersion) {
+  }

Review comment:
       if no implementation here, please add unsupported exception

##########
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
##########
@@ -566,6 +569,47 @@ class CarbonScanRDD[T: ClassTag](
           }
           havePair = false
           val value = reader.getCurrentValue
+          if (CarbonProperties.getInstance()
+                .getProperty(CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3,
+                  CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3_DEFAULT)
+                .toBoolean &&

Review comment:
       move this to above line

##########
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
##########
@@ -566,6 +569,47 @@ class CarbonScanRDD[T: ClassTag](
           }
           havePair = false
           val value = reader.getCurrentValue
+          if (CarbonProperties.getInstance()
+                .getProperty(CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3,
+                  CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3_DEFAULT)
+                .toBoolean &&
+              model.getProjectionColumns.exists(_.getDataType == 
DataTypes.TIMESTAMP)) {
+            value match {
+              case row: GenericInternalRow if
+                reader.isInstanceOf[CarbonRecordReader[T]] &&
+                null !=
+                
reader.asInstanceOf[CarbonRecordReader[T]].getCarbonDataFileWrittenVersion &&
+                // carbonDataFileWrittenVersion will be in the format 
x.x.x-SNAPSHOT
+                // (eg., 2.1.0-SNAPSHOT), get the version name and check if 
the data file is
+                // written before 2.2.0 version, then rebase timestamp value

Review comment:
       u can add this comment after line 577 and then correct the style of if 
condition code

##########
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##########
@@ -439,11 +447,102 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
     }
   }
 
+  private static long createTimeInstant(String dimensionValue, String 
dateFormat) {

Review comment:
       this method is very long, can you add a detailed comments in method 
explaining what is the logic in this method with an example, so that it will be 
easy for review and for any other developer in future to change for any issues 
or improvements

##########
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
##########
@@ -566,6 +569,47 @@ class CarbonScanRDD[T: ClassTag](
           }
           havePair = false
           val value = reader.getCurrentValue
+          if (CarbonProperties.getInstance()
+                .getProperty(CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3,
+                  CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3_DEFAULT)
+                .toBoolean &&
+              model.getProjectionColumns.exists(_.getDataType == 
DataTypes.TIMESTAMP)) {
+            value match {
+              case row: GenericInternalRow if
+                reader.isInstanceOf[CarbonRecordReader[T]] &&
+                null !=
+                
reader.asInstanceOf[CarbonRecordReader[T]].getCarbonDataFileWrittenVersion &&
+                // carbonDataFileWrittenVersion will be in the format 
x.x.x-SNAPSHOT
+                // (eg., 2.1.0-SNAPSHOT), get the version name and check if 
the data file is
+                // written before 2.2.0 version, then rebase timestamp value
+                
reader.asInstanceOf[CarbonRecordReader[T]].getCarbonDataFileWrittenVersion
+                  .split(CarbonCommonConstants.HYPHEN).head
+                  .compareTo(CarbonCommonConstants.CARBON_SPARK3_VERSION) < 0 
=>
+                var i = 0
+                // rebase timestamp data by converting julian to Gregorian time
+                model.getProjectionColumns.foreach {
+                  projectionColumn =>
+                    var isComplexDimension = false
+                    // ignore Timestamp complex dimensions
+                    projectionColumn match {
+                      case dimension: CarbonDimension =>
+                        isComplexDimension = 
dimension.getComplexParentDimension != null
+                      case _ =>
+                    }
+                    if (!isComplexDimension &&
+                        projectionColumn.getDataType == DataTypes.TIMESTAMP) {
+                      val timeStampData = row.get(i,
+                        org.apache.spark.sql.types.DataTypes.TimestampType)
+                      if (null != timeStampData) {
+                        row.update(i,
+                          
CarbonToSparkAdapter.rebaseTime(timeStampData.asInstanceOf[Long]))
+                      }
+                    }
+                    i = i + 1

Review comment:
       here instead of checking each and every column with `I` index and 
checking value for timestamp, can we get the direct index for all timestamp 
columns and update them, that would be more efficient I feel.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4177: [CARBONDATA-4247][CARBONDATA-4241] Fix Wrong timestamp value query results for data before 1900 years with Spark 3.1

Reply via email to