On Sat, Dec 9, 2017 at 3:38 AM, Atri Sharma <[email protected]> wrote:
> Thanks for the specification. > > A couple of questions: > > 1) what does this to parquet and not to any underlying store? > 2) If above is not true, can we expose an interface to install any > underlying file format? > 3) if we are defining snapshots, can we allow MVCC on top of the snapshots? > > To elaborate on 3) I would like to see a full > set transactional file Format present which allows us to be generic and > performant. > > I would be interested in doing a specification for update in this format. > Can you please share the repository link and some internale documents to > Understand a bit more? > The format uses a form of MVCC to ensure readers always use a consistent snapshot of the table without blocking writers. New versions show up atomically. However, this doesn't use a "transactional file format". It uses an atomic operation to replace a table's current metadata using immutable files. This is necessary for compatibility with file systems like HDFS and S3 where the files must be stored. I'm not sure what you mean by questions 1 or 2. Parquet is a data file format used in Iceberg tables. Avro is also allowed so that the tables support a write-optimized format (Avro) and a read-optimized format (Parquet). File formats are tracked on a per-file basis. rb
