Hi Gang and Xiening, We at Vertica have been actively contributing and using the ORC C++ project as well. C++ writer will be a great addition to this project and we will look forward to working with you in merging your contributions. Thanks.
On Wed, Apr 26, 2017 at 2:13 AM, Gang Wu <[email protected]> wrote: > Hi, > This is Gang from Alibaba working on Alibaba's big data platform - > MaxCompute. We have developed our own columnar storage format within > MaxCompute to support MapReduce and other batch processing workload. But as > Apache Orc is getting popular in the industry, we are actively looking at > integrating Orc format into MaxCompute. > In the past few months, Xiening (cc'ed) and I have been working on > echancing Orc C++ to provide full featured C++ reader and writer. Our work > mainly involves adding a C++ writer that supports all data types and stats, > and supporting index for both reader and writer. As of today, we have > finished development and testing and plan to contribute this work back to > the Apach Orc project. We have communicated with Owen via email and have > created an umbrella JIRA ORC-179 for the plan. In brief, we plan to do the > following: > 1. Refactor common classes for writer and reader > -- extract common classes and functions for writer and reader to share > 2. OutputStream interface for writer > -- implement several output streams for writing to memory, file, etc. > -- implement ByteRleEncoder, RleEncoder, BooleanRleEncoder, etc. > -- support zlib compression > 3. ORC Writer > -- write orc file header, file footer, postscript, etc. > -- write columns of all types > -- write column statistics > -- write index stream in writer and reader seeks to > row based on index information > 4. other > -- some minor bug fixes of current code base. > > Should you have any question, please feel free to contact us. Any > feedbacks and suggestions are welcome. Thanks! > Gang WuSenior EngineerAlibaba Group > -- regards, Deepak Majeti, Software Engineer at Vertica
