shardulm94 commented on a change in pull request #1271:
URL: https://github.com/apache/iceberg/pull/1271#discussion_r464729558
##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -288,8 +289,10 @@ public void nonNullWrite(int rowId, LocalDate data,
ColumnVector output) {
@Override
public void nonNullWrite(int rowId, OffsetDateTime data, ColumnVector
output) {
TimestampColumnVector cv = (TimestampColumnVector) output;
- cv.time[rowId] = data.toInstant().toEpochMilli(); // millis
- cv.nanos[rowId] = (data.getNano() / 1_000) * 1_000; // truncate nanos to
only keep microsecond precision
+ // millis
+ cv.time[rowId] = data.toInstant().toEpochMilli();
+ // truncate nanos to only keep microsecond precision
+ cv.nanos[rowId] = Math.floorDiv(data.getNano(), 1_000) * 1_000;
Review comment:
data.getNano() always returns positive integer, so is this change
required?
##########
File path:
spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java
##########
@@ -195,7 +196,12 @@ public Long nonNullRead(ColumnVector vector, int row) {
@Override
public Decimal nonNullRead(ColumnVector vector, int row) {
HiveDecimalWritable value = ((DecimalColumnVector) vector).vector[row];
- return new Decimal().set(value.serialize64(value.scale()),
value.precision(), value.scale());
+ BigDecimal decimal = new
BigDecimal(BigInteger.valueOf(value.serialize64(value.scale())), value.scale());
Review comment:
`value.serialize64()` will take in an expected scale as a parameter, so
I think the only change required to the original code is to pass our expected
reader scale into `value.serialize64()` instead of passing `value.scale()`.
##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -324,14 +329,24 @@ public void nonNullWrite(int rowId, LocalDateTime data,
ColumnVector output) {
@Override
public void nonNullWrite(int rowId, BigDecimal data, ColumnVector output) {
- // TODO: validate precision and scale from schema
+ Preconditions.checkArgument(data.scale() == scale,
+ "Cannot write value as decimal(%s,%s), wrong scale: %s", precision,
scale, data);
+ Preconditions.checkArgument(data.precision() <= precision && precision
<= 18,
Review comment:
Nit: `precision <= 18` check can be moved into the constructor
##########
File path: spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcWriter.java
##########
@@ -237,9 +239,14 @@ public void addValue(int rowId, int column,
SpecializedGetters data,
output.noNulls = false;
output.isNull[rowId] = true;
} else {
+ Decimal decimal = data.getDecimal(column, precision, scale);
+ Preconditions.checkArgument(scale == decimal.scale(),
+ "Cannot write value as decimal(%s,%s), wrong scale: %s",
precision, scale, decimal);
+ Preconditions.checkArgument(decimal.precision() <= precision &&
precision <= 18,
+ "Cannot write value as decimal(%s,%s), invalid precision: %s",
decimal);
Review comment:
This check seems redundant to me. If we are already passing our expected
precision and scale to `data.getDecimal()`, wont the scale and precision of the
returned decimal always match?
##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -288,8 +289,10 @@ public void nonNullWrite(int rowId, LocalDate data,
ColumnVector output) {
@Override
public void nonNullWrite(int rowId, OffsetDateTime data, ColumnVector
output) {
TimestampColumnVector cv = (TimestampColumnVector) output;
- cv.time[rowId] = data.toInstant().toEpochMilli(); // millis
- cv.nanos[rowId] = (data.getNano() / 1_000) * 1_000; // truncate nanos to
only keep microsecond precision
+ // millis
+ cv.time[rowId] = data.toInstant().toEpochMilli();
+ // truncate nanos to only keep microsecond precision
+ cv.nanos[rowId] = Math.floorDiv(data.getNano(), 1_000) * 1_000;
Review comment:
If it is indeed needed, we should also update `TimestampWriter`.
##########
File path: spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcWriter.java
##########
@@ -261,9 +268,14 @@ public void addValue(int rowId, int column,
SpecializedGetters data,
output.isNull[rowId] = true;
} else {
output.isNull[rowId] = false;
- ((DecimalColumnVector) output).vector[rowId].set(
- HiveDecimal.create(data.getDecimal(column, precision, scale)
- .toJavaBigDecimal()));
+
+ Decimal decimal = data.getDecimal(column, precision, scale);
+ Preconditions.checkArgument(scale == decimal.scale(),
+ "Cannot write value as decimal(%s,%s), wrong scale: %s",
precision, scale, decimal);
+ Preconditions.checkArgument(decimal.precision() <= precision &&
precision <= 38,
+ "Cannot write value as decimal(%s,%s), invalid precision: %s",
precision, scale, decimal);
Review comment:
This check seems redundant to me. If we are already passing our expected
precision and scale to `data.getDecimal()`, wont the scale and precision of the
returned decimal always match?
##########
File path:
spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java
##########
@@ -212,6 +218,10 @@ public Decimal nonNullRead(ColumnVector vector, int row) {
public Decimal nonNullRead(ColumnVector vector, int row) {
BigDecimal value = ((DecimalColumnVector) vector).vector[row]
.getHiveDecimal().bigDecimalValue();
+
+ Preconditions.checkArgument(value.precision() <= precision && precision
<= 38,
Review comment:
Nit: `precision <= 38` check can be moved into the constructor
##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -340,7 +355,11 @@ public void nonNullWrite(int rowId, BigDecimal data,
ColumnVector output) {
@Override
public void nonNullWrite(int rowId, BigDecimal data, ColumnVector output) {
- // TODO: validate precision and scale from schema
+ Preconditions.checkArgument(data.scale() == scale,
+ "Cannot write value as decimal(%s,%s), wrong scale: %s", precision,
scale, data);
+ Preconditions.checkArgument(data.precision() <= precision && precision
<= 38,
Review comment:
Nit: `precision <= 38` check can be moved into the constructor
##########
File path:
spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java
##########
@@ -195,7 +196,12 @@ public Long nonNullRead(ColumnVector vector, int row) {
@Override
public Decimal nonNullRead(ColumnVector vector, int row) {
HiveDecimalWritable value = ((DecimalColumnVector) vector).vector[row];
- return new Decimal().set(value.serialize64(value.scale()),
value.precision(), value.scale());
+ BigDecimal decimal = new
BigDecimal(BigInteger.valueOf(value.serialize64(value.scale())), value.scale());
+
+ Preconditions.checkArgument(value.precision() <= precision && precision
<= 18,
Review comment:
Nit: `precision <= 18` check can be moved into the constructor
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]