Hello!

Recently, I’ve been researching Iceberg with the goal of developing some simple 
code exemplifying how to use the Iceberg Java API. The goal was to share this 
internally with developers along with information we’ve gained about Iceberg to 
start discussions on whether we could use Iceberg in our systems. On reviewing 
the documentation and code we thought this could be useful for anyone 
interested in learning more about Iceberg so we would like to open source it.  
We noticed that Iceberg has a folder for examples 
(https://github.com/apache/incubator-iceberg/tree/master/examples) - there 
isn’t much there right now but it could be a good location for our examples and 
documentation.

Our project is currently structured as many small JUnit tests that target the 
different functionality of Iceberg (such as the reading/writing of 
partitioned/unpartitioned tables, schema evolution, time travel etc). We went 
for this approach so we could use it as a sort of quickstart guide to using 
Iceberg with different use cases in mind.

The code we have currently focuses mainly on using HadoopTables with Spark (in 
Java) and contains tests that follow this sort of pattern:

@Test
  public void writeToTableFromFile() {
    Dataset<Row> df = spark.read().json(dataLocation + "/employees.json");

    df.select("name", "salary").write()
      .format("iceberg")
      .mode("append")
      .save(tableLocation.toString());

    table.refresh();

    df.createOrReplaceTempView("table");

    Dataset<Row> sqlDF = spark.sql("select * from table");
    assertEquals(sqlDF.count(), 10);
}

Could the developers on the project let us know if they think the above would 
be a useful contribution and if so, what the next steps would be? We’re happy 
to answer any questions and provide more info etc.

Thank you and all the best,

Christine Mathiesen
Software Development Intern
BDP – Hotels.com
Expedia Group

Reply via email to