Yeah, that's what I suspected. Thanks for the pointers. As an aside, the Orc website takes some liberties. If ACID is a feature of Hive and isn't supported by Orc, it probably shouldn't be the first claim on the website.
On Tue, Jan 29, 2019 at 9:40 AM Alan Gates <alanfga...@gmail.com> wrote: > To answer the original question, it's split between the two. The storage > requires a new column that records transaction id, row id, and some other > information. To read ACID data integration with the Hive metastore is > required so that the reader understands which records are valid and which > are not. Writers also need to access the metastore to open and commit > transactions for any new records they write. > > Shant's comment that the work is mostly in Hive at this point is true. I > started work on porting the storage piece into Orc in > https://issues.apache.org/jira/projects/ORC/issues/ORC-255 You can see > the > progress I made at https://github.com/alanfgates/orc/tree/orc255 > The patch is a year out of date so probably needs some help. In particular > it needs to be in sync with what Hive is doing. And I was only focusing on > the vector batch interface not the row-by-row one, which may or may not be > what interests you. I suspect Hive will continue to want to go under the > covers and access things directly in ORC, but some kind of interface or > contract needs to be worked out to keep ORC readers and the Hive reader in > sync. > > Alan. > > On Mon, Jan 28, 2019 at 8:37 PM Shant Hovsepian <sh...@arcadiadata.com> > wrote: > > > ORC ACID is more of a Hive feature than an ORC feature. > > > > Regretfully it's not defined in a engine agnostic way. Would be great to > > make the ACID layout part of the file format definition or as a generic > > container definition or an extension to the Hive table format, so it > would > > be easier to use across tools. It's especially troubling that ACID is on > by > > default in HDP 3.X for Hive 3.1. Makes it very hard to read Hive > generated > > ORC files unless the table is created as an external table instead of a > > managed table. > > > > -Shant > > > > On Mon, Jan 28, 2019 at 11:06 PM Jacques Nadeau <jacq...@apache.org> > > wrote: > > > > > How much of the Acid functionality of Orc is actually in the Orc > project? > > > The website seems to suggest it is core to Orc but a quick glance at > the > > > code and it seems like really the code is mostly elsewhere? > > > > > > Thanks > > > Jacques > > > > > >