Hi, Regarding available open-source columnar formats, I have also come across https://carbondata.apache.org/ but do not really know anything about it other than it exists.
Br, Zoltan On Thu, May 16, 2019 at 11:27 PM Wes McKinney <[email protected]> wrote: > hi Brian, > > Anecdotal evidence suggests that Parquet has more market share than > ORC, but I have heard that ORC has been gaining some adoption lately > due to its ACID support in Hive > (https://orc.apache.org/docs/acid.html). Parquet and ORC are the only > two open source columnar storage solutions out there AFAIK. Now that > Cloudera (one of the Parquet creators) and Hortonworks (one of the ORC > creators) have merged, it will be interesting to see where engineering > time is invested going forward. > > - Wes > > On Thu, May 9, 2019 at 2:21 PM Uwe L. Korn <[email protected]> wrote: > > > > Hello, > > > > Be aware that Avro and Protobuf are general serialization formats, not > columnar ones such as Parquet or ORC. They are good for RPC or row-wise > streaming whereas the latter two are perfect for analytics. > > > > Uwe > > > > > Am 09.05.2019 um 20:33 schrieb David Mollitor <[email protected]>: > > > > > > I'm sure there are many different opinions on the matter, but in > regards to > > > Avro, I would say it is becoming more and more of a niche player. > > > > > > Many folks are choosing to go with Google Protobufs for RPC and > Parquet/ORC > > > for analytic workloads. > > > > > >> On Thu, May 9, 2019 at 2:30 PM Brian Bowman <[email protected]> > wrote: > > >> > > >> All, > > >> > > >> Is it fair to say that Parquet is fast becoming the dominate open > source > > >> columnar storage format? How do those of you with long-term Hadoop > > >> experience see this? For example, is Parquet overtaking ORC and Avro? > > >> > > >> Thanks, > > >> > > >> Brian > > >> > > >
