hi Brian, Anecdotal evidence suggests that Parquet has more market share than ORC, but I have heard that ORC has been gaining some adoption lately due to its ACID support in Hive (https://orc.apache.org/docs/acid.html). Parquet and ORC are the only two open source columnar storage solutions out there AFAIK. Now that Cloudera (one of the Parquet creators) and Hortonworks (one of the ORC creators) have merged, it will be interesting to see where engineering time is invested going forward.
- Wes On Thu, May 9, 2019 at 2:21 PM Uwe L. Korn <[email protected]> wrote: > > Hello, > > Be aware that Avro and Protobuf are general serialization formats, not > columnar ones such as Parquet or ORC. They are good for RPC or row-wise > streaming whereas the latter two are perfect for analytics. > > Uwe > > > Am 09.05.2019 um 20:33 schrieb David Mollitor <[email protected]>: > > > > I'm sure there are many different opinions on the matter, but in regards to > > Avro, I would say it is becoming more and more of a niche player. > > > > Many folks are choosing to go with Google Protobufs for RPC and Parquet/ORC > > for analytic workloads. > > > >> On Thu, May 9, 2019 at 2:30 PM Brian Bowman <[email protected]> wrote: > >> > >> All, > >> > >> Is it fair to say that Parquet is fast becoming the dominate open source > >> columnar storage format? How do those of you with long-term Hadoop > >> experience see this? For example, is Parquet overtaking ORC and Avro? > >> > >> Thanks, > >> > >> Brian > >> >
