Sounds good to me.
at some point (later) we'll have to do some cross compatibility testing
with parquet-mr as well to make sure everything is on the same page.
CC'ing some folks who should probably chime in.


On Fri, Jan 29, 2016 at 10:21 AM, Wes McKinney <[email protected]> wrote:

> hi folks,
>
> Since there's so many moving pieces with creating a full-featured Parquet
> reader-writer, I propose we start planning out a plan to create test
> fixtures and tools to enable us to develop faster.
>
> Specifically, we need to achieve maximum decoupling between functional
> components. Every unit of functionality should be testable without having
> to create actual valid Parquet test data files. Smoke tests on real data
> will help, but it's a band-aid solution vs approaching the problem from a
> rigorous test-driven perspective.
>
> To assist with the discussion, let's address the different parts of the
> testing process
>
> - Functional unit testing of decoupled components. We need to make a
> diagram of all those boxes and what is their interface with each other. For
> example: a column decoder only needs to know how to ask for its next data
> page, but not where the data page is located physically.
>
> - Integration / macro-level testing, i.e. the "everything works together"
> part of the problem.
>
> I don't think investing in much top-down / integration testing of the
> library will help us (and may actually actively hurt us) until we organize
> the functional components of the library in a way that everything can be
> tested easily in isolation.
>
> I propose that we use a Google document to help with this design process
> and we can learn from parquet-mr and other implementations of Parquet to
> help move things along. In doing this we can cross-reference existing and
> new JIRAs so that it's clear exactly what needs to be done for each part of
> the system.
>
> Let me know your thoughts.
>
> thanks,
> Wes
>



-- 
Julien

Reply via email to