Cool, thanks! Looking forward for the VOTE thread :) On Saturday, May 17, 2014, Chris Aniszczyk <[email protected]> wrote:
> Your request about the user list seems fine, no need to have multiple lists > atm IMHO. > > The proposal has been updated accordingly, thanks! > https://wiki.apache.org/incubator/ParquetProposal?action=recall&rev=21 > > > > On Sat, May 17, 2014 at 9:33 AM, Henry Saputra > <[email protected]<javascript:;> > >wrote: > > > Chris, could you please address my concern about user@ list > > > > - Henry > > > > On Fri, May 16, 2014 at 4:43 PM, Chris Aniszczyk <[email protected]> > > wrote: > > > SGTM Roman, thanks for volunteering! > > > > > > I'll start the vote on Sunday barring any issues. > > > > > > > > > On Fri, May 16, 2014 at 11:56 AM, Roman Shaposhnik <[email protected]> > > wrote: > > > > > >> Hi! > > >> > > >> proposal looks good to me and I am very much looking > > >> for a voting thread. > > >> > > >> One small request, since I plan to spend a fair amount > > >> of time on Parquet anyway, would you guys be ok > > >> with adding me as an extra mentor so I can help > > >> with that aspect of the project as well? > > >> > > >> Thanks, > > >> Roman. > > >> > > >> P.S. Plus it has an added benefit of increasing diversity > > >> of affiliations from the get go. > > >> > > >> On Mon, May 12, 2014 at 10:02 AM, Chris Aniszczyk < > [email protected] > > > > > >> wrote: > > >> > We would like to propose Parquet as an Apache Incubator project. > > >> > https://wiki.apache.org/incubator/ParquetProposal > > >> > > > >> > Feel free to comment, we'll go for a vote in a week or two or > whenever > > >> > consensus has been reached on the proposal. > > >> > > > >> > I've posted posted the text of the proposal below: > > >> > > > >> > == Abstract == > > >> > Parquet is a columnar storage format for Hadoop. > > >> > > > >> > == Proposal == > > >> > > > >> > We created Parquet to make the advantages of compressed, efficient > > >> columnar > > >> > data representation available to any project in the Hadoop > ecosystem, > > >> > regardless of the choice of data processing framework, data model, > or > > >> > programming language. > > >> > > > >> > == Background == > > >> > > > >> > Parquet is built from the ground up with complex nested data > > structures > > >> in > > >> > mind, and uses the repetition/definition level approach to encoding > > such > > >> > data structures, as popularized by Google Dremel ( > > >> > https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We > > >> believe > > >> > this approach is superior to simple flattening of nested name > spaces. > > >> > > > >> > Parquet is built to support very efficient compression and encoding > > >> > schemes. Parquet allows compression schemes to be specified on a > > >> per-column > > >> > level, and is future-proofed to allow adding more encodings as they > > are > > >> > invented and implemented. We separate the concepts of encoding and > > >> > compression, allowing parquet consumers to implement operators that > > work > > >> > directly on encoded data without paying decompression and decoding > > >> penalty > > >> > when possible. > > >> > > > >> > == Rationale == > > >> > > > >> > Parquet is built to be used by anyone. We believe that an efficient, > > >> > well-implemented columnar storage substrate should be useful to all > > >> > frameworks without the cost of extensive and difficult to set up > > >> > dependencies. > > >> > > > >> > Furthermore, the rapid growth of Parquet community is empowered by > > open > > >> > source. We believe the Apache foundation is a great fit as the > > long-term > > >> > home for Parquet, as it provides an established process for > > >> > community-driven development and decision making by consensus. This > is > > >> > exactly the model we want for future Parquet development. > > >> > > > >> > == Initial Goa
