Re: PRIORITY - Fwd: SD Times story on Apache Parquet graduating to TLP

Sally Khudairi Mon, 27 Apr 2015 19:14:13 -0700

Thanks, Julien --I'll augment my response to Rob.

Warm regards,
Sally


[From the mobile; please excuse top-posting, spelling/spacing errors, and 
brevity]

----- Reply message -----
From: "Julien Le Dem" <[email protected]>
To: <[email protected]>, "Sally Khudairi" <[email protected]>
Cc: "Ryan Blue" <[email protected]>, "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>, "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>, "[email protected]" <[email protected]>
Subject: PRIORITY - Fwd: SD Times story on Apache Parquet graduating to TLP
Date: Mon, Apr 27, 2015 22:10

Hey, 
sorry I missed your email. Thanks Ryan for answering.
I expanded on Ryan's answers bellow.

On Apr 27, 2015, at 6:03 PM, Sally Khudairi <[email protected]> 
wrote:

> Thanks so much, Ryan --I appreciate this!
> 
> Warmly,
> Sally
> 
> 
> ----- Original Message -----
> From: Ryan Blue <[email protected]>
> To: Sally Khudairi <[email protected]>; [email protected]
> Cc: [email protected]; Parquet Dev List <[email protected]>; 
> [email protected]; [email protected]; [email protected]; 
> [email protected]; [email protected]
> Sent: Monday, 27 April 2015, 20:36
> Subject: Re: PRIORITY - Fwd: SD Times story on Apache Parquet graduating to 
> TLP
> 
> My answers are inline. Feel free to edit or add to them!
> 
> rb
> 
> On 04/27/2015 01:09 PM, Sally Khudairi wrote:
>> Hello Julien and Parquet PMC --per below, the SD Times is looking to
>> cover Parquet for a story tomorrow morning and needs the following
>> questions answered.
>> 
>> If you can please forward your responses, I'll be happy to coordinate
>> with Rob.
>> 
>> Thanks in advance,
>>  Sally
>> 
>> 
>> [From the mobile; please excuse top-posting, spelling/spacing errors,
>> and brevity]
>> 
>> ----- Forwarded message -----
>> From: "Rob Marvin" <[email protected]>
>> To: "Sally Khudairi" <[email protected]>
>> Subject: SD Times story on Apache Parquet graduating to TLP
>> Date: Mon, Apr 27, 2015 15:41
>> 
>> Hi Sally,
>> 
>> I hope you're well. I'm reaching out because I'm putting together a
>> brief SD Times story on Apache Parquet's elevation to Top-Level Project,
>> and I'd like to get an original quote or two to accompany the story. Can
>> you ping Julien Le Dern or another ASF member on the Apache Parquet team
>> for a brief comment or two?
>> 
>> We're looking to run the story by tomorrow morning at the latest. Here
>> are a couple questions to guide the comments:
>> 
>> -What is it that makes Apache Parquet unique in what the columnar
>> storage format brings to the Hadoop ecosystem and the many companies
>> using the project in production?
> 
> Bring your own object model: Lots of applications are based on existing 
> row-oriented formats, like Avro and Thrift, that come with objects to 
> represent the data. A great feature of Parquet is that it is built to 
> work natively with those existing classes, so you don't have to change 
> the application to go from a row-oriented to a column-oriented format. 
> Parquet can read directly to Avro records, Spark data frames, Hive's 
> internal writables, and others.

The open object model and language agnostic format definition lead to 
integration in most of the ecosystem.
This makes it easy to experiment with the many query engines an framework that 
exist today.
You don't need to import data into a proprietary storage to analyze. When 
there's a lot of data that's really important.


> 
>> -What does Parquet's elevation to TLP signify for its development going
>> forward, and what can developers expect in terms of the future growth
>> and evolution of the project?
> 
> Graduation from the Incubator to become a TLP shows that the Parquet 
> project has a healthy Apache community. I think that's one of the best 
> votes of confidence you could have in an open source project: people 
> care about it, put time into it, and know how to work together.
> 
> That's an asset to future growth and we can see it in the on-going 
> development efforts. For example, experts on Drill, Presto, and Hive 
> projects are collaborating on a vectorized API for accessing Parquet 
> data. It's great that we can work together on Parquet as a community 
> standard across those projects.
> 
> In more practical terms, we've finished a lot of the migration work to 
> become part of the Apache Software Foundation and we're looking forward 
> to a more regular release cadence again.

This is the last step in making Parquet a community driven standard. Parquet is 
not controlled by a single company but is the result of the collaboration of 
many contributors including teams developing the query engines listed above (I 
would add Impala and spark SQL) as well as companies that have adopted it and 
invest in open source (Twitter, Netflix, Stripe, Criteo) and of course the many 
other contributors.
We're looking towards those query engines to continue to deeply integrate with 
the columnar format to rip all the benefits. We're also improving the APIs so 
that the benefits of Parquet apply as well to custom applications developed in 
hadoop (for example using optimized filter in MapReduce)

> 
>> That's it! Quick and easy. Let me know if you have any questions and
>> when I can expect the quotes.
>> 
>> Thanks in advance for your help!
>> 
>> Best,
>> Rob
>> 
>> --
>> Rob Marvin <http://sdt.bz/about/RobMarvin>
>> Online & Social Media Editor
>> BZ Media LLC, SD Times
>> O: (631) 421-4158 x131
>> C: (516) 987-9926
>> [email protected] <mailto:[email protected]>
>> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Cloudera, Inc.

Re: PRIORITY - Fwd: SD Times story on Apache Parquet graduating to TLP

Reply via email to