openinx commented on a change in pull request #1271:
URL: https://github.com/apache/iceberg/pull/1271#discussion_r466203001
##########
File path:
spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java
##########
@@ -195,7 +197,15 @@ public Long nonNullRead(ColumnVector vector, int row) {
@Override
public Decimal nonNullRead(ColumnVector vector, int row) {
HiveDecimalWritable value = ((DecimalColumnVector) vector).vector[row];
- return new Decimal().set(value.serialize64(value.scale()),
value.precision(), value.scale());
+
+ // The scale of decimal read from hive ORC file may be not equals to the
expected scale. For data type
+ // decimal(10,3) and the value 10.100, the hive ORC writer will remove
its trailing zero and store it
+ // as 101*10^(-1), its scale will adjust from 3 to 1. So here we could
not assert that value.scale() == scale.
+ // we also need to convert the hive orc decimal to a decimal with
expected precision and scale.
+ Preconditions.checkArgument(value.precision() <= precision,
+ "Cannot read value as decimal(%s,%s), too large: %s", precision,
scale, value);
Review comment:
It is necessary to do this check. we need to make sure that there's no
bug when written a decimal into ORC. For example, for decimal(3, 0) data type
we encounter a hive decimal `10000` (whose precision is 5), that should be
something wrong. Throwing an exception is the correct way in that case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]