Hi Wes, Thank you for setting up the doc. This is a great idea and a better setup than discussing this via JIRA. Can you please give me edit access?
Aliaksei. On 01/31/2016 03:11 PM, Wes McKinney wrote: > Dear all, > > I created a publicly available document where we can organize the > parquet-cpp roadmap and outstanding JIRAs. I tried to organize all of the > open JIRAs by functional component. Since there are about 40 open JIRAs now > (and this will continue to balloon as we make progress) this seems like a > good way to stay on the same page. > > https://docs.google.com/document/d/1WyquzupLc3UkErO2OhqLJNQ9a84Cccc8LVUSuLQz39o/edit# > > Please request edit access and I will add you -- anyone can view (but not > edit) the document. > > I stress that it is going to be extremely difficult for us to move forward > in parallel without stopping to invest in unit test infrastructure and > designing every component in a way that it can be tested in isolation. I've > begun doing this for the primitive column readers in > https://github.com/apache/parquet-cpp/pull/32, but it's a bare minimum > effort to be able to write tests for the work that's been done the last two > weeks. > > Thank you, > Wes > > On Fri, Jan 29, 2016 at 10:48 AM, Julien Le Dem <[email protected]> wrote: > >> Sounds good to me. >> at some point (later) we'll have to do some cross compatibility testing >> with parquet-mr as well to make sure everything is on the same page. >> CC'ing some folks who should probably chime in. >> >> >> On Fri, Jan 29, 2016 at 10:21 AM, Wes McKinney <[email protected]> wrote: >> >>> hi folks, >>> >>> Since there's so many moving pieces with creating a full-featured Parquet >>> reader-writer, I propose we start planning out a plan to create test >>> fixtures and tools to enable us to develop faster. >>> >>> Specifically, we need to achieve maximum decoupling between functional >>> components. Every unit of functionality should be testable without having >>> to create actual valid Parquet test data files. Smoke tests on real data >>> will help, but it's a band-aid solution vs approaching the problem from a >>> rigorous test-driven perspective. >>> >>> To assist with the discussion, let's address the different parts of the >>> testing process >>> >>> - Functional unit testing of decoupled components. We need to make a >>> diagram of all those boxes and what is their interface with each other. >> For >>> example: a column decoder only needs to know how to ask for its next data >>> page, but not where the data page is located physically. >>> >>> - Integration / macro-level testing, i.e. the "everything works together" >>> part of the problem. >>> >>> I don't think investing in much top-down / integration testing of the >>> library will help us (and may actually actively hurt us) until we >> organize >>> the functional components of the library in a way that everything can be >>> tested easily in isolation. >>> >>> I propose that we use a Google document to help with this design process >>> and we can learn from parquet-mr and other implementations of Parquet to >>> help move things along. In doing this we can cross-reference existing and >>> new JIRAs so that it's clear exactly what needs to be done for each part >> of >>> the system. >>> >>> Let me know your thoughts. >>> >>> thanks, >>> Wes >>> >> >> >> -- >> Julien >>
