Owen, something similar has come up in a roadmap discussion of mine. I have
a question about the solution you mentioned.

The requirements would be that there is a 1:1 mapping between rows in the
> matching files and stripes.
>

Were you thinking that there would really be a 1:1 mapping and that the
rows would just line up in the right order? That seems fragile to me. I
would have thought that there would need to be a common key that the rows
were identified by (which is more in line with HBase column families, which
you referenced; so maybe this was what you meant but didn't illustrate
explicitly).

With that in mind, I might have written:

file1.orc: struct<id:int,name:string,email:string> file2.orc:
struct<id:int,lastAccess:timestamp>

On Wed, Nov 28, 2018 at 1:14 PM Owen O'Malley <owen.omal...@gmail.com>
wrote:

> I’m not sure what use case Erik is looking for, but I’ve had users that
> want to do the equivalent of HBase’s column families. They want some of the
> columns to be stored separately and the merged together on read. The
> requirements would be that there is a 1:1 mapping between rows in the
> matching files and stripes.
>
> It would look like:
>
> file1.orc: struct<name:string,email:string> file2.orc:
> struct<lastAccess:timestamp>
>
> It would let them leave the stable information and only re-write the
> second column family when the information in the mutable column family
> changes. It would also support use cases where you add data enrichment
> columns after the data has been ingested.
>
> From there it is easy to imagine having a replace operator where file2’s
> version of a column replaces file1’s version.
>
> .. Owen
>
> > On Nov 28, 2018, at 9:44 AM, Ryan Blue <rb...@netflix.com.INVALID>
> wrote:
> >
> > What do you mean by merge on read?
> >
> > A few people I've talked to are interested in building delete and upsert
> > features. Those would create files that track the changes, which would be
> > merged at read time to apply them. Is that what you mean?
> >
> > rb
> >
> > On Tue, Nov 27, 2018 at 12:26 PM Erik Wright
> > <erik.wri...@shopify.com.invalid> wrote:
> >
> >> Has any consideration been given to the possibility of eventual
> >> merge-on-read support in the Iceberg table spec?
> >>
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
>

Reply via email to