hi Brian,

Anecdotal evidence suggests that Parquet has more market share than
ORC, but I have heard that ORC has been gaining some adoption lately
due to its ACID support in Hive
(https://orc.apache.org/docs/acid.html). Parquet and ORC are the only
two open source columnar storage solutions out there AFAIK. Now that
Cloudera (one of the Parquet creators) and Hortonworks (one of the ORC
creators) have merged, it will be interesting to see where engineering
time is invested going forward.

- Wes

On Thu, May 9, 2019 at 2:21 PM Uwe L. Korn <[email protected]> wrote:
>
> Hello,
>
> Be aware that Avro and Protobuf are general serialization formats, not 
> columnar ones such as Parquet or ORC. They are good for RPC or row-wise 
> streaming whereas the latter two are perfect for analytics.
>
> Uwe
>
> > Am 09.05.2019 um 20:33 schrieb David Mollitor <[email protected]>:
> >
> > I'm sure there are many different opinions on the matter, but in regards to
> > Avro, I would say it is becoming more and more of a niche player.
> >
> > Many folks are choosing to go with Google Protobufs for RPC and Parquet/ORC
> > for analytic workloads.
> >
> >> On Thu, May 9, 2019 at 2:30 PM Brian Bowman <[email protected]> wrote:
> >>
> >> All,
> >>
> >> Is it fair to say that Parquet is fast becoming the dominate open source
> >> columnar storage format?   How do those of you with long-term Hadoop
> >> experience see this?  For example, is Parquet overtaking ORC and Avro?
> >>
> >> Thanks,
> >>
> >> Brian
> >>
>

Reply via email to