the mailling list: [email protected]
On Tue, Mar 12, 2013 at 10:40 AM, Todd Lipcon <[email protected]> wrote: > Hey Jacques, > > Feel free to ping us with any questions. Despite some of the _users_ of > Parquet competing with each other (eg query engines), we hope the file > format itself can be easily implemented by everyone and become ubiquitous. > > There are a few changes still in flight that we're working on, so you may > want to join the parquet dev mailing list as well to follow along. > > Thanks > -Todd > > On Tue, Mar 12, 2013 at 10:29 AM, Jacques Nadeau <[email protected]> wrote: > >> When you said soon, you meant very soon. This looks like great work. >> Thanks for sharing it with the world. Will come back after spending some >> time with it. >> >> thanks again, >> Jacques >> >> >> >> On Tue, Mar 12, 2013 at 9:50 AM, Julien Le Dem <[email protected]> wrote: >> >> > The repo is now available: http://parquet.github.com/ >> > Let me know if you have questions >> > >> > On Mon, Mar 11, 2013 at 11:31 AM, Jacques Nadeau <[email protected]> >> > wrote: >> > > There definitely seem to be some new kids on the block. I really hope >> > that >> > > Drill can adopt either ORC or Parquet as a closely related "native" >> > format. >> > > At the moment, I'm actually more focused on the in-memory execution >> > > format and the right abstraction to support compressed columnar >> execution >> > > and vectorization. Historically, the biggest gaps I'd worry about are >> > > java-centricity and expectation of early materialization & >> decompression. >> > > Once we get some execution stuff working, lets see how each fits in. >> > > Rather than start a third competing format (or fourth if you count >> > > Trevni), let's either use or extend/contribute back on one of the >> > existing >> > > new kids. >> > > >> > > Julien, do you think more will be shared about Parquet before the >> Hadoop >> > > Summit so we can start toying with using it inside of Drill? >> > > >> > > J >> > > >> > > On Mon, Mar 11, 2013 at 11:02 AM, Ken Krugler >> > > <[email protected]>wrote: >> > > >> > >> Hi all, >> > >> >> > >> I've been trying to track down status/comparisons of various columnar >> > >> formats, and just heard about Parquet. >> > >> >> > >> I don't have any direct experience with Parquet, but Really Smart Guy >> > said: >> > >> >> > >> > From what I hear there are two key features that >> > >> > differentiate it from ORC and Trevni: 1) columns can be optionally >> > split >> > >> into >> > >> > separate files, and 2) the mechanism for shredding nested fields >> into >> > >> > columns is taken almost verbatim from Dremel. Feature (1) won't be >> > >> practical >> > >> > to use until Hadoop introduces support for a file group locality >> > >> feature, but once it >> > >> > does this feature should enable more efficient use of the buffer >> cache >> > >> for predicate >> > >> > pushdown operations. >> > >> >> > >> -- Ken >> > >> >> > >> >> > >> On Mar 11, 2013, at 10:56am, Julien Le Dem wrote: >> > >> >> > >> > Parquet is actually implementing the algorithm described in the >> > >> > "Nested Columnar Storage" section of the Dremel paper[1]. >> > >> > >> > >> > [1] http://research.google.com/pubs/pub36632.html >> > >> > >> > >> > On Mon, Mar 11, 2013 at 10:41 AM, Timothy Chen <[email protected]> >> > >> wrote: >> > >> >> Just saw this: >> > >> >> >> > >> >> http://t.co/ES1dGDZlKA >> > >> >> >> > >> >> I know Trevni is another Dremel inspired Columnar format as well, >> > anyone >> > >> >> saw much info Parquet and how it's different? >> > >> >> >> > >> >> Tim >> > >> >> > >> -------------------------- >> > >> Ken Krugler >> > >> +1 530-210-6378 >> > >> http://www.scaleunlimited.com >> > >> custom big data solutions & training >> > >> Hadoop, Cascading, Cassandra & Solr >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera
