Hello!
Recently, I’ve been researching Iceberg with the goal of developing some simple
code exemplifying how to use the Iceberg Java API. The goal was to share this
internally with developers along with information we’ve gained about Iceberg to
start discussions on whether we could use Iceberg in our systems. On reviewing
the documentation and code we thought this could be useful for anyone
interested in learning more about Iceberg so we would like to open source it.
We noticed that Iceberg has a folder for examples
(https://github.com/apache/incubator-iceberg/tree/master/examples) - there
isn’t much there right now but it could be a good location for our examples and
documentation.
Our project is currently structured as many small JUnit tests that target the
different functionality of Iceberg (such as the reading/writing of
partitioned/unpartitioned tables, schema evolution, time travel etc). We went
for this approach so we could use it as a sort of quickstart guide to using
Iceberg with different use cases in mind.
The code we have currently focuses mainly on using HadoopTables with Spark (in
Java) and contains tests that follow this sort of pattern:
@Test
public void writeToTableFromFile() {
Dataset<Row> df = spark.read().json(dataLocation + "/employees.json");
df.select("name", "salary").write()
.format("iceberg")
.mode("append")
.save(tableLocation.toString());
table.refresh();
df.createOrReplaceTempView("table");
Dataset<Row> sqlDF = spark.sql("select * from table");
assertEquals(sqlDF.count(), 10);
}
Could the developers on the project let us know if they think the above would
be a useful contribution and if so, what the next steps would be? We’re happy
to answer any questions and provide more info etc.
Thank you and all the best,
Christine Mathiesen
Software Development Intern
BDP – Hotels.com
Expedia Group