Hi Wes,

Thank you for setting up the doc. This is a great idea and a better
setup than discussing this via JIRA. Can you please give me edit access?

Aliaksei.


On 01/31/2016 03:11 PM, Wes McKinney wrote:
> Dear all,
>
> I created a publicly available document where we can organize the
> parquet-cpp roadmap and outstanding JIRAs. I tried to organize all of the
> open JIRAs by functional component. Since there are about 40 open JIRAs now
> (and this will continue to balloon as we make progress) this seems like a
> good way to stay on the same page.
>
> https://docs.google.com/document/d/1WyquzupLc3UkErO2OhqLJNQ9a84Cccc8LVUSuLQz39o/edit#
>
> Please request edit access and I will add you -- anyone can view (but not
> edit) the document.
>
> I stress that it is going to be extremely difficult for us to move forward
> in parallel without stopping to invest in unit test infrastructure and
> designing every component in a way that it can be tested in isolation. I've
> begun doing this for the primitive column readers in
> https://github.com/apache/parquet-cpp/pull/32, but it's a bare minimum
> effort to be able to write tests for the work that's been done the last two
> weeks.
>
> Thank you,
> Wes
>
> On Fri, Jan 29, 2016 at 10:48 AM, Julien Le Dem <[email protected]> wrote:
>
>> Sounds good to me.
>> at some point (later) we'll have to do some cross compatibility testing
>> with parquet-mr as well to make sure everything is on the same page.
>> CC'ing some folks who should probably chime in.
>>
>>
>> On Fri, Jan 29, 2016 at 10:21 AM, Wes McKinney <[email protected]> wrote:
>>
>>> hi folks,
>>>
>>> Since there's so many moving pieces with creating a full-featured Parquet
>>> reader-writer, I propose we start planning out a plan to create test
>>> fixtures and tools to enable us to develop faster.
>>>
>>> Specifically, we need to achieve maximum decoupling between functional
>>> components. Every unit of functionality should be testable without having
>>> to create actual valid Parquet test data files. Smoke tests on real data
>>> will help, but it's a band-aid solution vs approaching the problem from a
>>> rigorous test-driven perspective.
>>>
>>> To assist with the discussion, let's address the different parts of the
>>> testing process
>>>
>>> - Functional unit testing of decoupled components. We need to make a
>>> diagram of all those boxes and what is their interface with each other.
>> For
>>> example: a column decoder only needs to know how to ask for its next data
>>> page, but not where the data page is located physically.
>>>
>>> - Integration / macro-level testing, i.e. the "everything works together"
>>> part of the problem.
>>>
>>> I don't think investing in much top-down / integration testing of the
>>> library will help us (and may actually actively hurt us) until we
>> organize
>>> the functional components of the library in a way that everything can be
>>> tested easily in isolation.
>>>
>>> I propose that we use a Google document to help with this design process
>>> and we can learn from parquet-mr and other implementations of Parquet to
>>> help move things along. In doing this we can cross-reference existing and
>>> new JIRAs so that it's clear exactly what needs to be done for each part
>> of
>>> the system.
>>>
>>> Let me know your thoughts.
>>>
>>> thanks,
>>> Wes
>>>
>>
>>
>> --
>> Julien
>>

Reply via email to