wangbo commented on a change in pull request #4524:
URL: https://github.com/apache/incubator-doris/pull/4524#discussion_r484159876
##########
File path:
fe/spark-dpp/src/main/java/org/apache/doris/load/loadv2/dpp/ColumnParser.java
##########
@@ -186,4 +192,61 @@ public boolean parse(String value) {
throw new RuntimeException("string check failed ", e);
}
}
+}
+
+class DecimalParser extends ColumnParser {
+
+ public static int PRECISION = 27;
+ public static int SCALE = 9;
+
+ private BigDecimal maxValue;
+ private BigDecimal minValue;
+
+ public DecimalParser(EtlJobConfig.EtlColumn etlColumn) {
+ StringBuilder precisionStr = new StringBuilder();
+ for (int i = 0; i < etlColumn.precision; i++) {
+ precisionStr.append("9");
+ }
+ StringBuilder scaleStr = new StringBuilder();
+ for (int i = 0; i < etlColumn.scale; i++) {
+ scaleStr.append("9");
+ }
+ maxValue = new BigDecimal(precisionStr.toString() + "." +
scaleStr.toString());
+ minValue = new BigDecimal("-" + precisionStr.toString() + "." +
scaleStr.toString());
+ }
+
+ @Override
+ public boolean parse(String value) {
+ try {
+ BigDecimal bigDecimal = new BigDecimal(value);
+ return bigDecimal.precision() - bigDecimal.scale() <= PRECISION -
SCALE && bigDecimal.scale() <= SCALE;
+ } catch (NumberFormatException e) {
+ return false;
+ } catch (Exception e) {
+ throw new RuntimeException("decimal parse failed ", e);
+ }
+ }
+
+ public BigDecimal getMaxValue() {
+ return maxValue;
+ }
+
+ public BigDecimal getMinValue() {
+ return minValue;
+ }
+}
+
+class LargeIntParser extends ColumnParser {
+
+ @Override
+ public boolean parse(String value) {
+ try {
+ BigInteger bigInteger = new BigInteger(value);
Review comment:
>BigInteger constructors and operations throw {@code
ArithmeticException} when
>the result is out of the supported range of
> -2 Integer.MAX_VALUE}(exclusive) to
> +2Integer.MAX_VALUE}(exclusive)
So I think add a ArithmeticException here is enough
##########
File path:
fe/spark-dpp/src/main/java/org/apache/doris/load/loadv2/dpp/SparkDpp.java
##########
@@ -358,12 +362,44 @@ private void processRollupTree(RollupTreeNode rootNode,
return Pair.of(keyMap.toArray(new Integer[keyMap.size()]),
valueMap.toArray(new Integer[valueMap.size()]));
}
- // repartition dataframe by partitionid_bucketid
- // so data in the same bucket will be consecutive.
- private JavaPairRDD<List<Object>, Object[]>
fillTupleWithPartitionColumn(SparkSession spark, Dataset<Row> dataframe,
+ /**
+ * check decimal,char/varchar
+ */
+ private boolean validateData(Object srcValue, EtlJobConfig.EtlColumn
etlColumn, ColumnParser columnParser,Row row) {
+
+ switch (etlColumn.columnType.toUpperCase()) {
+ case "DECIMALV2":
+ // TODO(wb): support decimal round; see be
DecimalV2Value::round
+ DecimalParser decimalParser = (DecimalParser) columnParser;
+ BigDecimal srcBigDecimal = (BigDecimal) srcValue;
+ if (srcValue != null &&
(decimalParser.getMaxValue().compareTo(srcBigDecimal) < 0 ||
decimalParser.getMinValue().compareTo(srcBigDecimal) > 0)) {
+ LOG.warn(String.format("decimal value is not valid for
defination, column=%s, value=%s,precision=%s,scale=%s",
+ etlColumn.columnName, srcValue.toString(),
srcBigDecimal.precision(), srcBigDecimal.scale()));
+ abnormalRowAcc.add(1);
+ return false;
+ }
+ break;
+ case "CHAR":
+ case "VARCHAR":
+ // TODO(wb) padding char type
+ if (srcValue != null && srcValue.toString().length() >
etlColumn.stringLength) {
Review comment:
👌
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]