Support orc format

Ming Li Fri, 17 Jun 2016 03:03:07 -0700

Hi Guys,

ORC (Optimized Row Columnar) is a very popular open source format adopted
in some major components in Hadoop eco-system. It is also used by a lot of
users. The advantages of supporting ORC storage in HAWQ are in two folds:
firstly, it makes HAWQ more Hadoop native which interacts with other
components more easily; secondly, ORC stores some meta info for query
optimization, thus, it might potentially outperform two native formats
(i.e., AO, Parquet) if it is available.


Since there are lots of popular formats available in HDFS community, and
more advanced formats are emerging frequently. It is good option for HAWQ
to design a general framework that supports pluggable c/c++ formats such as
ORC, as well as native format such as AO and Parquet. In designing this
framework, we also need to support data stored in different file systems:
HDFS, local disk, amazon S3, etc. Thus, it is better to offer a framework
to support pluggable formats and pluggable file systems.

We are proposing support ORC in JIRA (
https://issues.apache.org/jira/browse/HAWQ-786). Please see the design spec
in the JIRA.

Your comments are appreciated!

Thanks
Ming Li

Support orc format

Reply via email to