It is expected, but like most of Hive's ACID layout is badly documented. The code is in OrcAcidUtils <https://github.com/apache/orc/blob/1c5a020382059b9fea3344ffe428b1f8986b0a12/java/core/src/java/org/apache/orc/impl/OrcAcidUtils.java#L42> .
.. Owen On Sat, Jun 15, 2019 at 12:25 PM Dain Sundstrom <[email protected]> wrote: > Is this expected behavior of ORC acid writers? If so, is it documented > somewhere? > > -dain > > ---- > Dain Sundstrom > Co-founder @ Presto Software Foundation, Co-creator of Presto ( > https://prestosql.io) > > > On Jun 14, 2019, at 6:17 PM, Owen O'Malley <[email protected]> > wrote: > > > > The hive acid format uses a side file that provides a sequence of the 8 > byte file offsets for completed file footers. If the file is there, it > passes the last offset to the reader and it will treat that as the end of > the file. > > > > In the case where you don't have that, searching for the string > “\003ORC” works really well for finding the tails. In the corrupted files > I've seen I've never needed more than that. > > > > .. Owen > > > >> On Jun 14, 2019, at 09:52, Xiening Dai <[email protected]> wrote: > >> > >> Hi all, > >> > >> In Orc appending scenario, the append operation (including writing the > additional data and the new footer) needs to be atomic. Otherwise if it > failed in between, the file tail would be unrecognizable. Unfortunately not > all file system can garantee atomic write. When failure does happen, in > order to recover the data before append, we would need to locate the > previous file footer by searching backward. And the only way to search for > the footer is by looking for the “ORC” magic string. But the current magic > string only has three characters and it’s likely the same string appears in > user data which will result in parsing a wrong footer, and the behavior is > undefined. > >> > >> So I am thinking that if we can change the magic string into some > 16-byte UUID. This way we can safely use it to locate the footer. The idea > is very similar to the sync maker in Avro. > >> > >> Thanks. > >
