[GitHub] [carbondata] akashrn5 commented on a change in pull request #4177: [CARBONDATA-4247][CARBONDATA-4241] Fix Wrong timestamp value query results for data before 1900 years with Spark 3.1

GitBox Mon, 26 Jul 2021 04:26:11 -0700


akashrn5 commented on a change in pull request #4177:
URL: https://github.com/apache/carbondata/pull/4177#discussion_r676322785




##########
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java
##########
@@ -87,6 +87,8 @@
 
   private transient DataFileFooter dataFileFooter;
 
+  private String carbonWrittenVersion = null;

Review comment:
       give comment here, which version exactly mean

##########
File path: 
core/src/main/java/org/apache/carbondata/core/scan/result/RowBatch.java
##########
@@ -38,6 +38,8 @@
    */
   protected int counter;
 
+  private String carbonWrittenVersion = null;

Review comment:
       instead of this, can we rename to `carbonDataFileWrittenVersion`, this 
will be more clear. We can replace in all places in this PR

##########
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -122,6 +122,16 @@ private CarbonCommonConstants() {
    */
   public static final String CARBON_TIMESTAMP_MILLIS = "dd-MM-yyyy 
HH:mm:ss:SSS";
 
+  /**
+   * CARBON Default TIME format
+   */
+  public static final String CARBON_TIMESTAMP_DEFAULT_FORMAT_TIME = " 
HH:mm:ss";

Review comment:
       i think this constant name will be confused with timestamp format one, 
can you update the name in a better way, you can also change the other new ones 
in this class

##########
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -2648,4 +2658,16 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_SDK_EMPTY_METADATA_PATH = 
"emptyMetadataFolder";
 
+  /**
+   * Property to identify if the spark version is above 3.x version
+   */
+  public static final String CARBON_SPARK_VERSION_SPARK3 = 
"carbon.spark.version.spark3";

Review comment:
       please add this property in documentation and mention when to enable it

##########
File path: 
integration/spark/src/main/java/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java
##########
@@ -348,6 +351,24 @@ private boolean nextBatch() {
     if (iterator.hasNext()) {
       iterator.processNextBatch(carbonColumnarBatch);
       int actualSize = carbonColumnarBatch.getActualSize();
+      if (Boolean.parseBoolean(CarbonProperties.getInstance()
+          .getProperty(CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3,
+              CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3_DEFAULT))
+          && null != carbonColumnarBatch.getCarbonWrittenVersion() &&
+          
carbonColumnarBatch.getCarbonWrittenVersion().split(CarbonCommonConstants.HYPHEN)[0]

Review comment:
       can add a comment how the carbondata version info looks like and that 
will explain why we need to split at Hyphen and select 0th data

##########
File path: 
integration/spark/src/main/common2.3and2.4/org/apache/spark/sql/SparkVersionAdapter.scala
##########
@@ -83,6 +83,10 @@ trait SparkVersionAdapter {
     DateTimeUtils.timestampToString(timeStamp)
   }
 
+  def rebaseTime(timestamp: Long): Long = {

Review comment:
       please add method level comment, why need no rebase

##########
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##########
@@ -439,11 +447,102 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
     }
   }
 
+  private static long createTimeInstant(String dimensionValue, String 
dateFormat) {
+    if (null == dateFormat || dateFormat.trim().isEmpty()) {
+      dateFormat = CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT;
+    } else if (!dateFormat.trim().contains(" ")) {
+      dateFormat += CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT_TIME;
+    }
+    String updatedDim = dimensionValue;
+    if (!dimensionValue.trim().contains(" ")) {
+      updatedDim += 
CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT_TIME_DATA;
+    }
+    // convert input data to proper format
+    List<String> dateFormatPattern = new ArrayList<>();
+    List<String> dimensionData = new ArrayList<>();
+    StringBuilder format = new StringBuilder();
+    // separate year, month, day,... to a list
+    for (int i = 0; i < dateFormat.length(); i++) {
+      char c = dateFormat.charAt(i);
+      if ((c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z')) {
+        format.append(c);
+      } else {
+        String value = format.toString();
+        if (value.equals("hh")) {
+          value = "HH";
+        }
+        dateFormatPattern.add(value);
+        dateFormatPattern.add(Character.toString(c));
+        format = new StringBuilder();
+      }
+      if (i + 1 == dateFormat.length()) {
+        dateFormatPattern.add(format.toString());
+      }
+    }
+    format = new StringBuilder();
+    // separate data year, month, day,.. to a list
+    for (int i = 0; i < updatedDim.length(); i++) {
+      char c = updatedDim.charAt(i);
+      if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z') {
+        break;
+      }
+      if (c >= '0' && c <= '9') {
+        format.append(c);
+      } else {
+        dimensionData.add(format.toString());
+        dimensionData.add(Character.toString(c));
+        format = new StringBuilder();
+      }
+      if (i + 1 == updatedDim.length()) {
+        dimensionData.add(format.toString());
+      }
+    }
+    // add 0's to year/month/day.. if the data format size doesn't match 
format size
+    if (!dimensionData.isEmpty() && !(dimensionData.size() < 
dateFormatPattern.size())) {
+      int i;
+      for (i = 0; i < dateFormatPattern.size(); i++) {
+        String currentTimestampFormat = dateFormatPattern.get(i);
+        String currentDimData = dimensionData.get(i);
+        if (currentTimestampFormat.length() != currentDimData.length()) {
+          if (currentDimData.length() < currentTimestampFormat.length()) {
+            dimensionData.set(i, "0" + currentDimData);
+          }
+        }
+      }
+      if (dimensionData.size() > dateFormatPattern.size()) {
+        dimensionData.subList(i, dimensionData.size()).clear();
+      }
+      updatedDim = String.join("", dimensionData);
+      dateFormat = String.join("", dateFormatPattern);
+    }
+    Instant instant = Instant.from(ZonedDateTime
+        .of(LocalDateTime.parse(updatedDim, 
DateTimeFormatter.ofPattern(dateFormat)),
+            ZoneId.systemDefault()));
+    validateTimeStampRange(instant.getEpochSecond());
+    long us = Math.multiplyExact(instant.getEpochSecond(), 1000L);
+    int nano = instant.getNano();
+    if (nano != 0) {
+      while (nano % 10 == 0) {
+        nano /= 10;
+      }
+    }
+    return Math.addExact(us, nano);
+  }
+
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
     Date dateToStr;
     DateFormat dateFormatter = null;
     long timeValue;
     try {
+      if (Boolean.parseBoolean(CarbonProperties.getInstance()
+          .getProperty(CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3,
+              CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3_DEFAULT))) {
+        try {
+          return createTimeInstant(dimensionValue, dateFormat.trim());
+        } catch (DateTimeParseException e) {
+          throw  new NumberFormatException(e.getMessage());

Review comment:
       please remove the extra space here

##########
File path: 
integration/spark/src/main/java/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java
##########
@@ -348,6 +351,24 @@ private boolean nextBatch() {
     if (iterator.hasNext()) {
       iterator.processNextBatch(carbonColumnarBatch);
       int actualSize = carbonColumnarBatch.getActualSize();
+      if (Boolean.parseBoolean(CarbonProperties.getInstance()
+          .getProperty(CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3,
+              CarbonCommonConstants.CARBON_SPARK_VERSION_SPARK3_DEFAULT))
+          && null != carbonColumnarBatch.getCarbonWrittenVersion() &&
+          
carbonColumnarBatch.getCarbonWrittenVersion().split(CarbonCommonConstants.HYPHEN)[0]
+              .compareTo(CarbonCommonConstants.CARBON_SPARK3_VERSION) < 0) {
+        // rebase Timestamp value
+        CarbonColumnVector[] columnVectors = carbonColumnarBatch.columnVectors;
+        for (CarbonColumnVector vector : columnVectors) {
+          if (DataTypes.TIMESTAMP == vector.getBlockDataType()) {
+            for (int i = 0; i < actualSize; i++) {
+              long timeStamp = vector.getLong(i);
+              long rebaseTime = CarbonToSparkAdapter.rebaseTime(timeStamp);

Review comment:
       here we are getting the filled vectors and then refilling all the values 
with rebased time again, it may impact the performance for the old stores which 
will come for rebase, please try to handle such that while filling first time 
itself, we will fill the rebased timestamp value to save the time and not 
impact for old stores

##########
File path: 
core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnVector.java
##########
@@ -46,6 +46,8 @@
 
   void putLong(int rowId, long value);
 
+  void putRebasedLong(long value, int rowId);

Review comment:
       keep same as other APIs, rowID as first parameter and value as second
   Also I think, no need to add new method to interface as we will be using 
same implementation as `putlong()`, we can reuse the same and avoid code 
changes, and only in the caller part we can add in comment that we will be 
adding the rebased value to vector than the actual value. That will keep code 
simple




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4177: [CARBONDATA-4247][CARBONDATA-4241] Fix Wrong timestamp value query results for data before 1900 years with Spark 3.1

Reply via email to