Re: [PROPOSAL] Parquet

Henry Saputra Sat, 17 May 2014 12:25:28 -0700

Cool, thanks! Looking forward for the VOTE thread :)

On Saturday, May 17, 2014, Chris Aniszczyk <[email protected]> wrote:


> Your request about the user list seems fine, no need to have multiple lists
> atm IMHO.
>
> The proposal has been updated accordingly, thanks!
> https://wiki.apache.org/incubator/ParquetProposal?action=recall&rev=21
>
>
>
> On Sat, May 17, 2014 at 9:33 AM, Henry Saputra 
> <[email protected]<javascript:;>
> >wrote:
>
> > Chris, could you please address my concern about user@ list
> >
> > - Henry
> >
> > On Fri, May 16, 2014 at 4:43 PM, Chris Aniszczyk <[email protected]>
> > wrote:
> > > SGTM Roman, thanks for volunteering!
> > >
> > > I'll start the vote on Sunday barring any issues.
> > >
> > >
> > > On Fri, May 16, 2014 at 11:56 AM, Roman Shaposhnik <[email protected]>
> > wrote:
> > >
> > >> Hi!
> > >>
> > >> proposal looks good to me and I am very much looking
> > >> for a voting thread.
> > >>
> > >> One small request, since I plan to spend a fair amount
> > >> of time on Parquet anyway, would you guys be ok
> > >> with adding me as an extra mentor so I can help
> > >> with that aspect of the project as well?
> > >>
> > >> Thanks,
> > >> Roman.
> > >>
> > >> P.S. Plus it has an added benefit of increasing diversity
> > >> of affiliations from the get go.
> > >>
> > >> On Mon, May 12, 2014 at 10:02 AM, Chris Aniszczyk <
> [email protected]
> > >
> > >> wrote:
> > >> > We would like to propose Parquet as an Apache Incubator project.
> > >> > https://wiki.apache.org/incubator/ParquetProposal
> > >> >
> > >> > Feel free to comment, we'll go for a vote in a week or two or
> whenever
> > >> > consensus has been reached on the proposal.
> > >> >
> > >> > I've posted posted the text of the proposal below:
> > >> >
> > >> > == Abstract ==
> > >> > Parquet is a columnar storage format for Hadoop.
> > >> >
> > >> > == Proposal ==
> > >> >
> > >> > We created Parquet to make the advantages of compressed, efficient
> > >> columnar
> > >> > data representation available to any project in the Hadoop
> ecosystem,
> > >> > regardless of the choice of data processing framework, data model,
> or
> > >> > programming language.
> > >> >
> > >> > == Background ==
> > >> >
> > >> > Parquet is built from the ground up with complex nested data
> > structures
> > >> in
> > >> > mind, and uses the repetition/definition level approach to encoding
> > such
> > >> > data structures, as popularized by Google Dremel (
> > >> > https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We
> > >> believe
> > >> > this approach is superior to simple flattening of nested name
> spaces.
> > >> >
> > >> > Parquet is built to support very efficient compression and encoding
> > >> > schemes. Parquet allows compression schemes to be specified on a
> > >> per-column
> > >> > level, and is future-proofed to allow adding more encodings as they
> > are
> > >> > invented and implemented. We separate the concepts of encoding and
> > >> > compression, allowing parquet consumers to implement operators that
> > work
> > >> > directly on encoded data without paying decompression and decoding
> > >> penalty
> > >> > when possible.
> > >> >
> > >> > == Rationale ==
> > >> >
> > >> > Parquet is built to be used by anyone. We believe that an efficient,
> > >> > well-implemented columnar storage substrate should be useful to all
> > >> > frameworks without the cost of extensive and difficult to set up
> > >> > dependencies.
> > >> >
> > >> > Furthermore, the rapid growth of Parquet community is empowered by
> > open
> > >> > source. We believe the Apache foundation is a great fit as the
> > long-term
> > >> > home for Parquet, as it provides an established process for
> > >> > community-driven development and decision making by consensus. This
> is
> > >> > exactly the model we want for future Parquet development.
> > >> >
> > >> > == Initial Goa

Re: [PROPOSAL] Parquet

Reply via email to