Yeah, that's what I suspected. Thanks for the pointers.

As an aside, the Orc website takes some liberties. If ACID is a feature of
Hive and isn't supported by Orc, it probably shouldn't be the first claim
on the website.

On Tue, Jan 29, 2019 at 9:40 AM Alan Gates <alanfga...@gmail.com> wrote:

> To answer the original question, it's split between the two.  The storage
> requires a new column that records transaction id, row id, and some other
> information.  To read ACID data integration with the Hive metastore is
> required so that the reader understands which records are valid and which
> are not.  Writers also need to access the metastore to open and commit
> transactions for any new records they write.
>
> Shant's comment that the work is mostly in Hive at this point is true.  I
> started work on porting the storage piece into Orc in
> https://issues.apache.org/jira/projects/ORC/issues/ORC-255  You can see
> the
> progress I made at https://github.com/alanfgates/orc/tree/orc255
> The patch is a year out of date so probably needs some help.  In particular
> it needs to be in sync with what Hive is doing.  And I was only focusing on
> the vector batch interface not the row-by-row one, which may or may not be
> what interests you.  I suspect Hive will continue to want to go under the
> covers and access things directly in ORC, but some kind of interface or
> contract needs to be worked out to keep ORC readers and the Hive reader in
> sync.
>
> Alan.
>
> On Mon, Jan 28, 2019 at 8:37 PM Shant Hovsepian <sh...@arcadiadata.com>
> wrote:
>
> > ORC ACID is more of a Hive feature than an ORC feature.
> >
> > Regretfully it's not defined in a engine agnostic way. Would be great to
> > make the ACID layout part of the file format definition or as a generic
> > container definition or an extension to the Hive table format, so it
> would
> > be easier to use across tools. It's especially troubling that ACID is on
> by
> > default in HDP 3.X for Hive 3.1. Makes it very hard to read Hive
> generated
> > ORC files unless the table is created as an external table instead of a
> > managed table.
> >
> > -Shant
> >
> > On Mon, Jan 28, 2019 at 11:06 PM Jacques Nadeau <jacq...@apache.org>
> > wrote:
> >
> > > How much of the Acid functionality of Orc is actually in the Orc
> project?
> > > The website seems to suggest it is core to Orc but a quick glance at
> the
> > > code and it seems like really the code is mostly elsewhere?
> > >
> > > Thanks
> > > Jacques
> > >
> >
>

Reply via email to