Hi all,

In Orc appending scenario, the append operation (including writing the 
additional data and the new footer) needs to be atomic. Otherwise if it failed 
in between, the file tail would be unrecognizable. Unfortunately not all file 
system can garantee atomic write. When failure does happen, in order to recover 
the data before append, we would need to locate the previous file footer by 
searching backward. And the only way to search for the footer is by looking for 
the “ORC” magic string. But the current magic string only has three characters 
and it’s likely the same string appears in user data which will result in 
parsing a wrong footer, and the behavior is undefined.

So I am thinking that if we can change the magic string into some 16-byte UUID. 
This way we can safely use it to locate the footer. The idea is very similar to 
the sync maker in Avro.

Thanks.

Reply via email to