[GitHub] [iceberg] karuppayya commented on a change in pull request #3273: Add File for Avro files throws PreconditionException

GitBox Wed, 13 Oct 2021 14:07:30 -0700


karuppayya commented on a change in pull request #3273:
URL: https://github.com/apache/iceberg/pull/3273#discussion_r728428098




##########
File path: 
spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java
##########
@@ -106,6 +117,55 @@ public void addDataUnpartitionedOrc() {
         sql("SELECT * FROM %s ORDER BY id", tableName));
   }
 
+  @Test
+  public void addDataUnpartitionedAvroFile() throws Exception {

Review comment:
       nit: rename method name, there seems to be another method with almost 
same name.

##########
File path: 
spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java
##########
@@ -106,6 +117,55 @@ public void addDataUnpartitionedOrc() {
         sql("SELECT * FROM %s ORDER BY id", tableName));
   }
 
+  @Test
+  public void addDataUnpartitionedAvroFile() throws Exception {
+    Schema schema = new Schema(
+        Types.NestedField.required(1, "id", Types.LongType.get()),
+        Types.NestedField.optional(2, "data", Types.StringType.get()));
+
+    GenericRecord baseRecord = GenericRecord.create(schema);
+
+    ImmutableList.Builder<Record> builder = ImmutableList.builder();
+    builder.add(baseRecord.copy(ImmutableMap.of("id", 1L, "data", "a")));
+    builder.add(baseRecord.copy(ImmutableMap.of("id", 2L, "data", "b")));
+    List<Record> records = builder.build();
+
+    OutputFile file = Files.localOutput(temp.newFile());
+
+    DataWriter<Record> dataWriter = Avro.writeData(file)
+        .schema(schema)
+        .createWriterFunc(org.apache.iceberg.data.avro.DataWriter::create)
+        .overwrite()
+        .withSpec(PartitionSpec.unpartitioned())
+        .build();
+
+    try {
+      for (Record record : records) {
+        dataWriter.add(record);
+      }
+    } finally {
+      dataWriter.close();
+    }
+
+    String path = dataWriter.toDataFile().path().toString();
+
+    String createIceberg =
+        "CREATE TABLE %s (id Long, data String) USING iceberg";
+    sql(createIceberg, tableName);
+
+    Object result = scalarSql("CALL %s.system.add_files('%s', '`avro`.`%s`')",
+        catalogName, tableName, path);
+    Assert.assertEquals(1L, result);
+
+    List<Object[]> expected = Lists.newArrayList(
+        new Object[]{1L, "a"},
+        new Object[]{2L, "b"}
+    );
+    assertEquals("Iceberg table contains correct data",
+        expected,
+        sql("SELECT * FROM %s ORDER BY id", tableName));

Review comment:
       Not related to this change.
   For a `COUNT` query, we currently do not rely on the record count metrics 
since Spark does not pushdown the count expression.
   When Spark supports count pushdown, using -1 as a literal record count will 
give incorrect results.
   Can we also  add a `COUNT` assertion to the test?
   
   Also since the users can read the metrics from manifests and compute the 
count, might be good idea to document the meaning of `-1` for record count 
metrics.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] karuppayya commented on a change in pull request #3273: Add File for Avro files throws PreconditionException

Reply via email to