hi folks, Spurred by the discussion and bugfix for PARQUET-799, I'd like to do something about the IO interfaces that we currently have implemented in parquet-cpp.
For C++ at least, the Parquet project is not an ideal place to be maintaining cross-platform IO and memory management. There are portability and concurrent access issues we will eventually need to deal with to make parquet-cpp work well in diverse production environments. In parallel, we've been developing a general, low-overhead IO subsystem inside Apache Arrow: https://github.com/apache/arrow/tree/master/cpp/src/arrow/io Since Arrow is about in-memory columnar data structures and efficient IO / RPC / IPC, this is a much more appropriate place to maintain such code (in the absence of a sort of "Apache C++ Commons" library). There, we currently have more mature implementations of: - Operating system files (which also work on Windows) - Memory mapped files - HDFS (either using libhdfs or libhdfs3 at your choosing) Additionally, the "Buffer" abstraction (which handles memory lifetime and provides a general-purpose way to pass around a block of memory which may or may not be owned by the application) is implemented in both Parquet [1] and Arrow [2]. Since, fundamentally, parquet-cpp is a library for encoding and decoding the Parquet file format rather than general purpose IO / file-like interfaces, I propose that we excise this code from the library and make Arrow a hard dependency in libparquet. I believe our respective developer communities would benefit from a hardening of the IO and memory interfaces that are being developed in Arrow, and it will lead to better quality software and reduced fragmentation. I wanted to bring this up as we are on the cusp of making the first ASF release of parquet-cpp, and while this work might not make the cut for 0.1, if we agree it's a good idea it would be good to do it sooner rather than later. Thanks and happy holidays / best wishes for 2017, Wes [1]: https://github.com/apache/parquet-cpp/blob/master/src/parquet/util/buffer.h [2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer.h
