[GitHub] [iceberg] pvary commented on a change in pull request #2126: Hive: Add write test for all supported types.

GitBox Thu, 21 Jan 2021 21:38:57 -0800


pvary commented on a change in pull request #2126:
URL: https://github.com/apache/iceberg/pull/2126#discussion_r561786578




##########
File path: 
hive3/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDateObjectInspectorHive3.java
##########
@@ -69,4 +69,12 @@ public Object copyObject(Object o) {
     }
   }
 
+  @Override
+  public LocalDate convert(Object o) {
+    if (o == null) {
+      return null;
+    }
+    Date date = (Date) o;

Review comment:
       nit of the nit: new line after if block 😄 

##########
File path: 
mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDecimalObjectInspector.java
##########
@@ -80,6 +80,12 @@ public Object copyObject(Object o) {
 
   @Override
   public BigDecimal convert(Object o) {
-    return o == null ? null : ((HiveDecimal) o).bigDecimalValue();
+    if (o == null) {
+      return null;
+    }
+
+    BigDecimal result = ((HiveDecimal) o).bigDecimalValue();
+    result = result.setScale(scale());

Review comment:
       Can we add a comment here?

##########
File path: 
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -296,6 +299,46 @@ public void testInsert() throws IOException {
     HiveIcebergTestUtils.validateData(table, new 
ArrayList<>(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS), 0);
   }
 
+  @Test
+  public void testInsertSupportedTypes() throws IOException {
+    Assume.assumeTrue("Tez write is not implemented yet", 
executionEngine.equals("mr"));
+    for (int i = 0; i < SUPPORTED_TYPES.size(); i++) {
+      Type type = SUPPORTED_TYPES.get(i);
+      // TODO: remove this filter when issue #1881 is resolved
+      if (type == Types.UUIDType.get() && fileFormat == FileFormat.PARQUET) {

Review comment:
       Can we use Assume here too?

##########
File path: 
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -296,6 +299,46 @@ public void testInsert() throws IOException {
     HiveIcebergTestUtils.validateData(table, new 
ArrayList<>(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS), 0);
   }
 
+  @Test
+  public void testInsertSupportedTypes() throws IOException {
+    Assume.assumeTrue("Tez write is not implemented yet", 
executionEngine.equals("mr"));
+    for (int i = 0; i < SUPPORTED_TYPES.size(); i++) {
+      Type type = SUPPORTED_TYPES.get(i);
+      // TODO: remove this filter when issue #1881 is resolved
+      if (type == Types.UUIDType.get() && fileFormat == FileFormat.PARQUET) {
+        continue;
+      }
+      // TODO: remove this filter when we figure out how we could test binary 
types
+      if (type.equals(Types.BinaryType.get()) || 
type.equals(Types.FixedType.ofLength(5))) {
+        continue;
+      }
+      String tableName = type.typeId().toString().toLowerCase() + "_table_" + 
i;
+      String columnName = type.typeId().toString().toLowerCase() + "_column";
+
+      Schema schema = new Schema(required(1, "id", Types.LongType.get()), 
required(2, columnName, type));
+      List<Record> expected = TestHelper.generateRandomRecords(schema, 5, 0L);
+      List<Record> records = new ArrayList<>(expected.size());
+      if (type == Types.TimestampType.withoutZone()) {
+        expected.forEach(r -> records.add(r.copy()));
+        records.forEach(r -> r.set(1, Timestamp.valueOf((LocalDateTime) 
r.get(1))));
+      } else if (type == Types.TimestampType.withZone()) {
+        expected.forEach(r -> records.add(r.copy()));
+        records.forEach(r -> r.set(1, Timestamp.from(((OffsetDateTime) 
r.get(1)).toInstant())));
+      } else {
+        records.addAll(expected);
+      }

Review comment:
       Am I understanding correctly when I think this conversion is for 
creating timestamps for which the toString is expected for Hive? Is it ok to 
set a Timestamp to a field of a Record where the type is Types.TimestampType?
   
   Maybe it would be better to have a `Map<Long, String> forValuesClause`, and 
move every transformation here (Timestamp, Boolean, and maybe later 
Fixed/Binary), and later just concatenate it for the query?

##########
File path: 
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -296,6 +299,46 @@ public void testInsert() throws IOException {
     HiveIcebergTestUtils.validateData(table, new 
ArrayList<>(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS), 0);
   }
 
+  @Test
+  public void testInsertSupportedTypes() throws IOException {
+    Assume.assumeTrue("Tez write is not implemented yet", 
executionEngine.equals("mr"));
+    for (int i = 0; i < SUPPORTED_TYPES.size(); i++) {
+      Type type = SUPPORTED_TYPES.get(i);
+      // TODO: remove this filter when issue #1881 is resolved
+      if (type == Types.UUIDType.get() && fileFormat == FileFormat.PARQUET) {
+        continue;
+      }
+      // TODO: remove this filter when we figure out how we could test binary 
types
+      if (type.equals(Types.BinaryType.get()) || 
type.equals(Types.FixedType.ofLength(5))) {
+        continue;
+      }
+      String tableName = type.typeId().toString().toLowerCase() + "_table_" + 
i;
+      String columnName = type.typeId().toString().toLowerCase() + "_column";
+
+      Schema schema = new Schema(required(1, "id", Types.LongType.get()), 
required(2, columnName, type));
+      List<Record> expected = TestHelper.generateRandomRecords(schema, 5, 0L);
+      List<Record> records = new ArrayList<>(expected.size());

Review comment:
       In my recent reviews I have learned that in Iceberg we should not create 
ArrayLists directly but should use the guava methods for that.
   
   nit: Lists.newArrayListWithCapacity(expected.size())

##########
File path: 
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -296,6 +299,37 @@ public void testInsert() throws IOException {
     HiveIcebergTestUtils.validateData(table, new 
ArrayList<>(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS), 0);
   }
 
+  @Test
+  public void testInsertSupportedTypes() throws IOException {
+    Assume.assumeTrue("Tez write is not implemented yet", 
executionEngine.equals("mr"));
+    for (int i = 0; i < SUPPORTED_TYPES.size(); i++) {
+      Type type = SUPPORTED_TYPES.get(i);
+      // TODO: remove this filter when issue #1881 is resolved
+      if (type == Types.UUIDType.get() && fileFormat == FileFormat.PARQUET) {
+        continue;
+      }
+      // TODO: remove this filter when we figure out how we could test binary 
types
+      if (type.equals(Types.BinaryType.get()) || 
type.equals(Types.FixedType.ofLength(5))) {
+        continue;
+      }
+      String tableName = type.typeId().toString().toLowerCase() + "_table_" + 
i;
+      String columnName = type.typeId().toString().toLowerCase() + "_column";
+
+      Schema schema = new Schema(required(1, "id", Types.LongType.get()), 
required(2, columnName, type));
+      List<Record> expected = TestHelper.generateRandomRecords(schema, 5, 0L);
+
+      Table table = testTables.createTable(shell, tableName, schema, 
fileFormat, ImmutableList.of());
+      StringBuilder query = new StringBuilder("INSERT INTO 
").append(tableName).append(" VALUES")
+              .append(expected.stream()
+                      // in hive2 every boolean value in apostrophes is 
translated to true
+                      .map(r -> String.format(type == Types.BooleanType.get() 
? "(%s,%s)" : "(%s,'%s')", r.get(0),

Review comment:
       Would it make sense to move the quotation marks to the 
`getStringValueForInsert` method as well?
   That would encapsulate the type related stuff entirely and it might be 
easier to understand the test code.
   
   What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary commented on a change in pull request #2126: Hive: Add write test for all supported types.

Reply via email to