karuppayya commented on a change in pull request #3273:
URL: https://github.com/apache/iceberg/pull/3273#discussion_r728428098
##########
File path:
spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java
##########
@@ -106,6 +117,55 @@ public void addDataUnpartitionedOrc() {
sql("SELECT * FROM %s ORDER BY id", tableName));
}
+ @Test
+ public void addDataUnpartitionedAvroFile() throws Exception {
Review comment:
nit: rename method name, there seems to be another method with almost
same name.
##########
File path:
spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java
##########
@@ -106,6 +117,55 @@ public void addDataUnpartitionedOrc() {
sql("SELECT * FROM %s ORDER BY id", tableName));
}
+ @Test
+ public void addDataUnpartitionedAvroFile() throws Exception {
+ Schema schema = new Schema(
+ Types.NestedField.required(1, "id", Types.LongType.get()),
+ Types.NestedField.optional(2, "data", Types.StringType.get()));
+
+ GenericRecord baseRecord = GenericRecord.create(schema);
+
+ ImmutableList.Builder<Record> builder = ImmutableList.builder();
+ builder.add(baseRecord.copy(ImmutableMap.of("id", 1L, "data", "a")));
+ builder.add(baseRecord.copy(ImmutableMap.of("id", 2L, "data", "b")));
+ List<Record> records = builder.build();
+
+ OutputFile file = Files.localOutput(temp.newFile());
+
+ DataWriter<Record> dataWriter = Avro.writeData(file)
+ .schema(schema)
+ .createWriterFunc(org.apache.iceberg.data.avro.DataWriter::create)
+ .overwrite()
+ .withSpec(PartitionSpec.unpartitioned())
+ .build();
+
+ try {
+ for (Record record : records) {
+ dataWriter.add(record);
+ }
+ } finally {
+ dataWriter.close();
+ }
+
+ String path = dataWriter.toDataFile().path().toString();
+
+ String createIceberg =
+ "CREATE TABLE %s (id Long, data String) USING iceberg";
+ sql(createIceberg, tableName);
+
+ Object result = scalarSql("CALL %s.system.add_files('%s', '`avro`.`%s`')",
+ catalogName, tableName, path);
+ Assert.assertEquals(1L, result);
+
+ List<Object[]> expected = Lists.newArrayList(
+ new Object[]{1L, "a"},
+ new Object[]{2L, "b"}
+ );
+ assertEquals("Iceberg table contains correct data",
+ expected,
+ sql("SELECT * FROM %s ORDER BY id", tableName));
Review comment:
Not related to this change.
For a `COUNT` query, we currently do not rely on the record count metrics
since Spark does not pushdown the count expression.
When Spark supports count pushdown, using -1 as a literal record count will
give incorrect results.
Can we also add a `COUNT` assertion to the test?
Also since the users can read the metrics from manifests and compute the
count, might be good idea to document the meaning of `-1` for record count
metrics.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]