morningman commented on a change in pull request #4524:
URL: https://github.com/apache/incubator-doris/pull/4524#discussion_r483931347



##########
File path: 
fe/spark-dpp/src/main/java/org/apache/doris/load/loadv2/dpp/SparkDpp.java
##########
@@ -358,12 +362,44 @@ private void processRollupTree(RollupTreeNode rootNode,
         return Pair.of(keyMap.toArray(new Integer[keyMap.size()]), 
valueMap.toArray(new Integer[valueMap.size()]));
     }
 
-    // repartition dataframe by partitionid_bucketid
-    // so data in the same bucket will be consecutive.
-    private JavaPairRDD<List<Object>, Object[]> 
fillTupleWithPartitionColumn(SparkSession spark, Dataset<Row> dataframe,
+    /**
+     *   check decimal,char/varchar
+     */
+    private boolean validateData(Object srcValue, EtlJobConfig.EtlColumn 
etlColumn, ColumnParser columnParser,Row row) {
+
+        switch (etlColumn.columnType.toUpperCase()) {
+            case "DECIMALV2":
+                // TODO(wb):  support decimal round; see be 
DecimalV2Value::round
+                DecimalParser decimalParser = (DecimalParser) columnParser;
+                BigDecimal srcBigDecimal = (BigDecimal) srcValue;
+                if (srcValue != null && 
(decimalParser.getMaxValue().compareTo(srcBigDecimal) < 0 || 
decimalParser.getMinValue().compareTo(srcBigDecimal) > 0)) {
+                    LOG.warn(String.format("decimal value is not valid for 
defination, column=%s, value=%s,precision=%s,scale=%s",
+                            etlColumn.columnName, srcValue.toString(), 
srcBigDecimal.precision(), srcBigDecimal.scale()));
+                    abnormalRowAcc.add(1);
+                    return false;
+                }
+                break;
+            case "CHAR":
+            case "VARCHAR":
+                // TODO(wb) padding char type
+                if (srcValue != null && srcValue.toString().length() > 
etlColumn.stringLength) {
+                    LOG.warn(String.format("the length of input is too long 
than schema. column_name:%s,input_str[%s],schema length:%s,actual length:%s",
+                            etlColumn.columnName, row.toString(), 
etlColumn.stringLength, srcValue.toString().length()));
+                    return false;
+                }
+                break;

Review comment:
       missing `DEFAULT`?

##########
File path: 
fe/spark-dpp/src/main/java/org/apache/doris/load/loadv2/dpp/ColumnParser.java
##########
@@ -186,4 +192,61 @@ public boolean parse(String value) {
             throw new RuntimeException("string check failed ", e);
         }
     }
+}
+
+class DecimalParser extends ColumnParser {
+
+    public static int PRECISION = 27;
+    public static int SCALE = 9;
+
+    private BigDecimal maxValue;
+    private BigDecimal minValue;
+
+    public DecimalParser(EtlJobConfig.EtlColumn etlColumn) {
+        StringBuilder precisionStr = new StringBuilder();
+        for (int i = 0; i < etlColumn.precision; i++) {

Review comment:
       If the column definition is `k1 decimal(4,3)`, than the value range of 
`k1` should be 
   `[-9.999, 9.999]`
   
   And your code will give: `[-9999.999, 9999.999]`, which is not right.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to