[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

ASF GitHub Bot (Jira) Wed, 23 Jun 2021 00:47:06 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25264?focusedWorklogId=613862&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-613862
 ]


ASF GitHub Bot logged work on HIVE-25264:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Jun/21 07:46
            Start Date: 23/Jun/21 07:46
    Worklog Time Spent: 10m 
      Work Description: kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656843384



##########
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws 
IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", 
Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), 
optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", 
"customers"));
+    List<Record> newCustomerWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, 
fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the 
result.
+    // It should be null for the old data and should be filled for the data 
added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, 
"Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM 
default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can 
be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = 
TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, 
null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null 
age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), 
(6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", 
"Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM 
default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    
icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", 
Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", 
Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first 
name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), 
(1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the 
result.
+    List<Record> customersWithAge = 
TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 
33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM 
default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, 
HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. 
But at the moment it
+    // works inconsistently for different file types, so leave it for later 
when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), 
required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, 
"last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and 
last_name with some initial data.

Review comment:
       Oh, thanks, fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 613862)
    Time Spent: 3h 10m  (was: 3h)

> Add tests to verify Hive can read/write after schema change on Iceberg table
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-25264
>                 URL: https://issues.apache.org/jira/browse/HIVE-25264
>             Project: Hive
>          Issue Type: Test
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We should verify if Hive can properly read/write Iceberg tables after their 
> schema was modified through the Iceberg API (it's like when an other engine, 
> like Spark has done modification on the schema). 
> Unit tests should be added for the following operations offered by the 
> UpdateSchema interface in the Iceberg API:
> - adding new top level column
> - adding new nested column
> - adding required column
> - adding required nested column
> - renaming a column
> - updating a column
> - making a column required
> - delete a column
> - changing the order of the columns in the schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

Reply via email to