Re: parquet-cpp development

Wes McKinney Wed, 27 Jul 2016 23:26:49 -0700

hi Ryan,

comments inline

On Wed, Jul 27, 2016 at 1:36 PM, Ryan Lewis <[email protected]> wrote:
> Hi,
>
> I am interested in contributing to parquet-cpp. I would like to introduce
> myself to the developer community, and as per suggestion from a private
> conversation with Wes McKinney, briefly discuss where I think I can add
> some value. What I have noticed is that the library is not leveraging the
> C++ standard library, and instead is recreating standard behavior. Two
> concrete examples are the InputStream class and its children, and the
> Buffer objects. These are perhaps modeled by the input iterator concept:
> http://en.cppreference.com/w/cpp/concept/InputIterator (or ideally some
> refinement of it) and the allocator concept
> http://en.cppreference.com/w/cpp/concept/Allocator. Components of the
> standard library which implements these concepts are listed in the previous
> links.
>

Can you describe or give some concrete examples of how places where
the current parquet-cpp MemoryAllocator concept could be altered to
use a stateful allocator implementation conforming to std::allocator /
std::allocator_traits? It is not clear to me that we are recreating
standard behavior; we have a working approach to user-defined memory
allocation and tracking that is distinct from the STL approach, even
if there is semantic functional overlap. We have discussed creating a
conforming STL allocator containing a pointer to a particular
parquet::MemoryAllocator in its state, and I think this is a good idea
independently (we are using the default allocator in a number of
places, which is bad because these allocations may not be known to the
user's custom allocator).

The intent of these classes is to provide users a way to define custom
implementations / subclasses of concepts that are not known to the
library at compile time (and instead using dynamic dispatch / virtual
functions to invoke the correct symbols). The risk I see is that if
you consistently use the STL template-based approach to defining
custom allocators, then this will cascade throughout much of the
library (since any current class which utilizes
parquet::MemoryAllocator will now need to know the type of the user
allocator at compile time). If parquet-cpp were a self-contained
system (rather than intended as a pluggable component), it would be a
different story. Maybe I'm thinking about it wrong, this is not my
area of expertise, having only browsed the STL documentation on custom
allocators.

Here is an example of a more advanced memory management subsystem in
Apache Kudu (see the abstract interface in BufferAllocator):

https://github.com/apache/incubator-kudu/blob/master/src/kudu/util/memory/memory.h

See also the STL-compliant allocator that hooks into Kudu's memory arenas here:

https://github.com/apache/incubator-kudu/blob/master/src/kudu/util/memory/arena.h
Usage: 
https://github.com/apache/incubator-kudu/blob/master/src/kudu/tablet/tablet_metrics.cc#L275

(note that Kudu is a self-contained system so not apples-to-apples)

The parquet::Buffer seems slightly orthogonal to memory allocators --
this was intended to make the semantics of memory ownership and
deallocation more flexible where needed.

> There is a decent amount of code in parquet-cpp and so its a bit
> challenging for me to provide a constructive thought on how to perform
> refactors (which I would happily do), to have the library use the standard
> library more, however, I also don't want to do something if it is not a
> welcome change. Therefore feedback from the developer community would be
> beneficial.

I would like to understand the argument a bit better and see some code
specifics. Would be great to get other C++ developers in the ASF
community to weigh in once we've identified the central pros / cons.

Thanks
Wes

>
> Best,
> -rhl

Re: parquet-cpp development

Reply via email to