Hi,

Regarding available open-source columnar formats, I have also come across
https://carbondata.apache.org/ but do not really know anything about it
other than it exists.

Br,

Zoltan

On Thu, May 16, 2019 at 11:27 PM Wes McKinney <[email protected]> wrote:

> hi Brian,
>
> Anecdotal evidence suggests that Parquet has more market share than
> ORC, but I have heard that ORC has been gaining some adoption lately
> due to its ACID support in Hive
> (https://orc.apache.org/docs/acid.html). Parquet and ORC are the only
> two open source columnar storage solutions out there AFAIK. Now that
> Cloudera (one of the Parquet creators) and Hortonworks (one of the ORC
> creators) have merged, it will be interesting to see where engineering
> time is invested going forward.
>
> - Wes
>
> On Thu, May 9, 2019 at 2:21 PM Uwe L. Korn <[email protected]> wrote:
> >
> > Hello,
> >
> > Be aware that Avro and Protobuf are general serialization formats, not
> columnar ones such as Parquet or ORC. They are good for RPC or row-wise
> streaming whereas the latter two are perfect for analytics.
> >
> > Uwe
> >
> > > Am 09.05.2019 um 20:33 schrieb David Mollitor <[email protected]>:
> > >
> > > I'm sure there are many different opinions on the matter, but in
> regards to
> > > Avro, I would say it is becoming more and more of a niche player.
> > >
> > > Many folks are choosing to go with Google Protobufs for RPC and
> Parquet/ORC
> > > for analytic workloads.
> > >
> > >> On Thu, May 9, 2019 at 2:30 PM Brian Bowman <[email protected]>
> wrote:
> > >>
> > >> All,
> > >>
> > >> Is it fair to say that Parquet is fast becoming the dominate open
> source
> > >> columnar storage format?   How do those of you with long-term Hadoop
> > >> experience see this?  For example, is Parquet overtaking ORC and Avro?
> > >>
> > >> Thanks,
> > >>
> > >> Brian
> > >>
> >
>

Reply via email to