Dear all,

I created a publicly available document where we can organize the
parquet-cpp roadmap and outstanding JIRAs. I tried to organize all of the
open JIRAs by functional component. Since there are about 40 open JIRAs now
(and this will continue to balloon as we make progress) this seems like a
good way to stay on the same page.

https://docs.google.com/document/d/1WyquzupLc3UkErO2OhqLJNQ9a84Cccc8LVUSuLQz39o/edit#

Please request edit access and I will add you -- anyone can view (but not
edit) the document.

I stress that it is going to be extremely difficult for us to move forward
in parallel without stopping to invest in unit test infrastructure and
designing every component in a way that it can be tested in isolation. I've
begun doing this for the primitive column readers in
https://github.com/apache/parquet-cpp/pull/32, but it's a bare minimum
effort to be able to write tests for the work that's been done the last two
weeks.

Thank you,
Wes

On Fri, Jan 29, 2016 at 10:48 AM, Julien Le Dem <[email protected]> wrote:

> Sounds good to me.
> at some point (later) we'll have to do some cross compatibility testing
> with parquet-mr as well to make sure everything is on the same page.
> CC'ing some folks who should probably chime in.
>
>
> On Fri, Jan 29, 2016 at 10:21 AM, Wes McKinney <[email protected]> wrote:
>
> > hi folks,
> >
> > Since there's so many moving pieces with creating a full-featured Parquet
> > reader-writer, I propose we start planning out a plan to create test
> > fixtures and tools to enable us to develop faster.
> >
> > Specifically, we need to achieve maximum decoupling between functional
> > components. Every unit of functionality should be testable without having
> > to create actual valid Parquet test data files. Smoke tests on real data
> > will help, but it's a band-aid solution vs approaching the problem from a
> > rigorous test-driven perspective.
> >
> > To assist with the discussion, let's address the different parts of the
> > testing process
> >
> > - Functional unit testing of decoupled components. We need to make a
> > diagram of all those boxes and what is their interface with each other.
> For
> > example: a column decoder only needs to know how to ask for its next data
> > page, but not where the data page is located physically.
> >
> > - Integration / macro-level testing, i.e. the "everything works together"
> > part of the problem.
> >
> > I don't think investing in much top-down / integration testing of the
> > library will help us (and may actually actively hurt us) until we
> organize
> > the functional components of the library in a way that everything can be
> > tested easily in isolation.
> >
> > I propose that we use a Google document to help with this design process
> > and we can learn from parquet-mr and other implementations of Parquet to
> > help move things along. In doing this we can cross-reference existing and
> > new JIRAs so that it's clear exactly what needs to be done for each part
> of
> > the system.
> >
> > Let me know your thoughts.
> >
> > thanks,
> > Wes
> >
>
>
>
> --
> Julien
>

Reply via email to