[GitHub] [iceberg] openinx commented on a change in pull request #1271: Align the records written by GenericOrcWriter and SparkOrcWriter

GitBox Mon, 03 Aug 2020 23:58:46 -0700


openinx commented on a change in pull request #1271:
URL: https://github.com/apache/iceberg/pull/1271#discussion_r464835177




##########
File path: 
spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java
##########
@@ -195,7 +196,12 @@ public Long nonNullRead(ColumnVector vector, int row) {
     @Override
     public Decimal nonNullRead(ColumnVector vector, int row) {
       HiveDecimalWritable value = ((DecimalColumnVector) vector).vector[row];
-      return new Decimal().set(value.serialize64(value.scale()), 
value.precision(), value.scale());
+      BigDecimal decimal = new 
BigDecimal(BigInteger.valueOf(value.serialize64(value.scale())), value.scale());
+
+      Preconditions.checkArgument(value.precision() <= precision && precision 
<= 18,

Review comment:
       ditto.

##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -288,8 +289,10 @@ public void nonNullWrite(int rowId, LocalDate data, 
ColumnVector output) {
     @Override
     public void nonNullWrite(int rowId, OffsetDateTime data, ColumnVector 
output) {
       TimestampColumnVector cv = (TimestampColumnVector) output;
-      cv.time[rowId] = data.toInstant().toEpochMilli(); // millis
-      cv.nanos[rowId] = (data.getNano() / 1_000) * 1_000; // truncate nanos to 
only keep microsecond precision
+      // millis
+      cv.time[rowId] = data.toInstant().toEpochMilli();
+      // truncate nanos to only keep microsecond precision
+      cv.nanos[rowId] = Math.floorDiv(data.getNano(), 1_000) * 1_000;

Review comment:
       OK, I saw that the javadoc says the nano-of-second is from 0 to 
999,999,999.  you're right, we don't need the `floorDiv` here. 

##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -324,14 +329,24 @@ public void nonNullWrite(int rowId, LocalDateTime data, 
ColumnVector output) {
 
     @Override
     public void nonNullWrite(int rowId, BigDecimal data, ColumnVector output) {
-      // TODO: validate precision and scale from schema
+      Preconditions.checkArgument(data.scale() == scale,
+          "Cannot write value as decimal(%s,%s), wrong scale: %s", precision, 
scale, data);
+      Preconditions.checkArgument(data.precision() <= precision && precision 
<= 18,

Review comment:
       the `precision <=18 ` can be removed now, because we've checked it 
[here](https://github.com/apache/iceberg/pull/1271/files#diff-b1b07b15f036000a3f2bed76fdd9f961R108).

##########
File path: 
spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java
##########
@@ -212,6 +218,10 @@ public Decimal nonNullRead(ColumnVector vector, int row) {
     public Decimal nonNullRead(ColumnVector vector, int row) {
       BigDecimal value = ((DecimalColumnVector) vector).vector[row]
           .getHiveDecimal().bigDecimalValue();
+
+      Preconditions.checkArgument(value.precision() <= precision && precision 
<= 38,

Review comment:
       ditto.

##########
File path: 
spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcValueReaders.java
##########
@@ -195,7 +196,12 @@ public Long nonNullRead(ColumnVector vector, int row) {
     @Override
     public Decimal nonNullRead(ColumnVector vector, int row) {
       HiveDecimalWritable value = ((DecimalColumnVector) vector).vector[row];
-      return new Decimal().set(value.serialize64(value.scale()), 
value.precision(), value.scale());
+      BigDecimal decimal = new 
BigDecimal(BigInteger.valueOf(value.serialize64(value.scale())), value.scale());

Review comment:
       Sounds great.  The essential purpose here is to construct a `Decimal` 
with the correct `precision` and `scale` ( instead of the `value.precision()` 
and `value.scale()`. 

##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -340,7 +355,11 @@ public void nonNullWrite(int rowId, BigDecimal data, 
ColumnVector output) {
 
     @Override
     public void nonNullWrite(int rowId, BigDecimal data, ColumnVector output) {
-      // TODO: validate precision and scale from schema
+      Preconditions.checkArgument(data.scale() == scale,
+          "Cannot write value as decimal(%s,%s), wrong scale: %s", precision, 
scale, data);
+      Preconditions.checkArgument(data.precision() <= precision && precision 
<= 38,

Review comment:
       ditto.

##########
File path: spark/src/main/java/org/apache/iceberg/spark/data/SparkOrcWriter.java
##########
@@ -237,9 +239,14 @@ public void addValue(int rowId, int column, 
SpecializedGetters data,
         output.noNulls = false;
         output.isNull[rowId] = true;
       } else {
+        Decimal decimal = data.getDecimal(column, precision, scale);
+        Preconditions.checkArgument(scale == decimal.scale(),
+            "Cannot write value as decimal(%s,%s), wrong scale: %s", 
precision, scale, decimal);
+        Preconditions.checkArgument(decimal.precision() <= precision && 
precision <= 18,
+            "Cannot write value as decimal(%s,%s), invalid precision: %s", 
decimal);

Review comment:
       Make sense.  they could be removed now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on a change in pull request #1271: Align the records written by GenericOrcWriter and SparkOrcWriter

Reply via email to