[GitHub] [iceberg] openinx commented on a change in pull request #1271: Align the records written by GenericOrcWriter and SparkOrcWriter

GitBox Thu, 06 Aug 2020 00:34:04 -0700


openinx commented on a change in pull request #1271:
URL: https://github.com/apache/iceberg/pull/1271#discussion_r466203001




##########
File path: 
spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java
##########
@@ -195,7 +197,15 @@ public Long nonNullRead(ColumnVector vector, int row) {
     @Override
     public Decimal nonNullRead(ColumnVector vector, int row) {
       HiveDecimalWritable value = ((DecimalColumnVector) vector).vector[row];
-      return new Decimal().set(value.serialize64(value.scale()), 
value.precision(), value.scale());
+
+      // The scale of decimal read from hive ORC file may be not equals to the 
expected scale. For data type
+      // decimal(10,3) and the value 10.100, the hive ORC writer will remove 
its trailing zero and store it
+      // as 101*10^(-1), its scale will adjust from 3 to 1. So here we could 
not assert that value.scale() == scale.
+      // we also need to convert the hive orc decimal to a decimal with 
expected precision and scale.
+      Preconditions.checkArgument(value.precision() <= precision,
+          "Cannot read value as decimal(%s,%s), too large: %s", precision, 
scale, value);

Review comment:
       It is necessary to do this check. we need to make sure that there's no 
bug when written a decimal into ORC. For example,  for decimal(3, 0) data type 
we encounter a hive decimal `10000` (whose precision is 5), that should be 
something wrong.  Throwing an exception is the correct way in that case. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on a change in pull request #1271: Align the records written by GenericOrcWriter and SparkOrcWriter

Reply via email to