Hi Guys, ORC (Optimized Row Columnar) is a very popular open source format adopted in some major components in Hadoop eco-system. It is also used by a lot of users. The advantages of supporting ORC storage in HAWQ are in two folds: firstly, it makes HAWQ more Hadoop native which interacts with other components more easily; secondly, ORC stores some meta info for query optimization, thus, it might potentially outperform two native formats (i.e., AO, Parquet) if it is available.
Since there are lots of popular formats available in HDFS community, and more advanced formats are emerging frequently. It is good option for HAWQ to design a general framework that supports pluggable c/c++ formats such as ORC, as well as native format such as AO and Parquet. In designing this framework, we also need to support data stored in different file systems: HDFS, local disk, amazon S3, etc. Thus, it is better to offer a framework to support pluggable formats and pluggable file systems. We are proposing support ORC in JIRA ( https://issues.apache.org/jira/browse/HAWQ-786). Please see the design spec in the JIRA. Your comments are appreciated! Thanks Ming Li
