Re: Another columnar format Parquet

Jacques Nadeau Tue, 12 Mar 2013 10:38:03 -0700

When you said soon, you meant very soon.  This looks like great work.
 Thanks for sharing it with the world.  Will come back after spending some
time with it.


thanks again,
Jacques



On Tue, Mar 12, 2013 at 9:50 AM, Julien Le Dem <[email protected]> wrote:

> The repo is now available: http://parquet.github.com/
> Let me know if you have questions
>
> On Mon, Mar 11, 2013 at 11:31 AM, Jacques Nadeau <[email protected]>
> wrote:
> > There definitely seem to be some new kids on the block.  I really hope
> that
> > Drill can adopt either ORC or Parquet as a closely related "native"
> format.
> >   At the moment, I'm actually more focused on the in-memory execution
> > format and the right abstraction to support compressed columnar execution
> > and vectorization.  Historically, the biggest gaps I'd worry about are
> > java-centricity and expectation of early materialization & decompression.
> >  Once we get some execution stuff working, lets see how each fits in.
> >  Rather than start a third competing format (or fourth if you count
> > Trevni), let's either use or extend/contribute back on one of the
> existing
> > new kids.
> >
> > Julien, do you think more will be shared about Parquet before the Hadoop
> > Summit so we can start toying with using it inside of Drill?
> >
> > J
> >
> > On Mon, Mar 11, 2013 at 11:02 AM, Ken Krugler
> > <[email protected]>wrote:
> >
> >> Hi all,
> >>
> >> I've been trying to track down status/comparisons of various columnar
> >> formats, and just heard about Parquet.
> >>
> >> I don't have any direct experience with Parquet, but Really Smart Guy
> said:
> >>
> >> > From what I hear there are two key features that
> >> > differentiate it from ORC and Trevni: 1) columns can be optionally
> split
> >> into
> >> > separate files, and 2) the mechanism for shredding nested fields into
> >> > columns is taken almost verbatim from Dremel. Feature (1) won't be
> >> practical
> >> > to use until Hadoop introduces support for a file group locality
> >> feature, but once it
> >> > does this feature should enable more efficient use of the buffer cache
> >> for predicate
> >> > pushdown operations.
> >>
> >> -- Ken
> >>
> >>
> >> On Mar 11, 2013, at 10:56am, Julien Le Dem wrote:
> >>
> >> > Parquet is actually implementing the algorithm described in the
> >> > "Nested Columnar Storage" section of the Dremel paper[1].
> >> >
> >> > [1] http://research.google.com/pubs/pub36632.html
> >> >
> >> > On Mon, Mar 11, 2013 at 10:41 AM, Timothy Chen <[email protected]>
> >> wrote:
> >> >> Just saw this:
> >> >>
> >> >> http://t.co/ES1dGDZlKA
> >> >>
> >> >> I know Trevni is another Dremel inspired Columnar format as well,
> anyone
> >> >> saw much info Parquet and how it's different?
> >> >>
> >> >> Tim
> >>
> >> --------------------------
> >> Ken Krugler
> >> +1 530-210-6378
> >> http://www.scaleunlimited.com
> >> custom big data solutions & training
> >> Hadoop, Cascading, Cassandra & Solr
> >>
> >>
> >>
> >>
> >>
> >>
>

Re: Another columnar format Parquet

Reply via email to