Sounds good to me. at some point (later) we'll have to do some cross compatibility testing with parquet-mr as well to make sure everything is on the same page. CC'ing some folks who should probably chime in.
On Fri, Jan 29, 2016 at 10:21 AM, Wes McKinney <[email protected]> wrote: > hi folks, > > Since there's so many moving pieces with creating a full-featured Parquet > reader-writer, I propose we start planning out a plan to create test > fixtures and tools to enable us to develop faster. > > Specifically, we need to achieve maximum decoupling between functional > components. Every unit of functionality should be testable without having > to create actual valid Parquet test data files. Smoke tests on real data > will help, but it's a band-aid solution vs approaching the problem from a > rigorous test-driven perspective. > > To assist with the discussion, let's address the different parts of the > testing process > > - Functional unit testing of decoupled components. We need to make a > diagram of all those boxes and what is their interface with each other. For > example: a column decoder only needs to know how to ask for its next data > page, but not where the data page is located physically. > > - Integration / macro-level testing, i.e. the "everything works together" > part of the problem. > > I don't think investing in much top-down / integration testing of the > library will help us (and may actually actively hurt us) until we organize > the functional components of the library in a way that everything can be > tested easily in isolation. > > I propose that we use a Google document to help with this design process > and we can learn from parquet-mr and other implementations of Parquet to > help move things along. In doing this we can cross-reference existing and > new JIRAs so that it's clear exactly what needs to be done for each part of > the system. > > Let me know your thoughts. > > thanks, > Wes > -- Julien
