Hi Iceberg devs, We are starting work on a somewhat similar project. The idea is that users can ask for incremental data since the last snapshot they processed, i.e the delta that was added since the last snapshot. Do you guys think that whether this can be a general feature that can we beneficial to Iceberg?
-R On Wed, Jul 17, 2019 at 10:16 AM Ryan Blue <rb...@netflix.com.invalid> wrote: > You can do this using time-travel. First, read the table at each snapshot. > This creates a temporary table for both snapshots: > > // create temp tables for each snapshot > spark.read.format("iceberg").option("snapshot-id", > 8924558786060583479L).load("db.table").createOrReplaceTempTable("s1") > spark.read.format("iceberg").option("snapshot-id", > 6536733823181975045L).load("db.table").createOrReplaceTempTable("s2") > > Next, use a left-outer join and a filter to find the rows that are only in > the second snapshot: > > SELECT s2.* FROM (SELECT * FROM s2 LEFT OUTER JOIN s1 ON s1.id = s2.id) WHERE > s1.id IS NULL > > Note that this is an expensive operation if you don’t also filter the > snapshots. You probably want to do this with a small table, or you want to > use filters to cut down on the data loaded that might match. For example, > if you’re writing events as they come in from Kafka, you can filter each > snapshot using the last day of data, then run the left outer join. > > On Wed, Jul 17, 2019 at 9:50 AM aa bb <abcd1...@yahoo.com.invalid> wrote: > >> Hi, >> >> Could you please advise how we can get delta data changes (diff) between >> 2 Snapshots? >> >> Is there any way providing 2 Snapshot Ids (8924558786060583479, >> 6536733823181975045) and get records that added after 8924558786060583479 >> ? >> >> +-------------------------+---------------------+---------------------+---------------------+ >> | made_current_at | snapshot_id | parent_id | >> is_current_ancestor | >> +-------------------------+---------------------+---------------------+---------------------+ >> | 2019-02-09 19:42:03.919 | 8924558786060583479 | 2999875608062437330 | true >> | >> | 2019-02-09 19:49:16.343 | 6536733823181975045 | 8924558786060583479 | true >> | >> +-------------------------+---------------------+---------------------+---------------------+ >> >> >> >> >> Thanks, >> > > > -- > Ryan Blue > Software Engineer > Netflix >