Re: Column index testing break down

Anna Szonyi Mon, 25 Feb 2019 09:27:22 -0800

Hi dev@,

After a week off, this week we have an excerpt from our internal data
interoperability testing, which tests compatibility between Hive, Spark and
Impala over Avro and Parquet. This test case is tailor-made to test
specific layouts so that files written using parquet-mr can be read by any
of the above mentioned components. We have also checked fault injection
cases.


The test suite is private currently, however we have made the test classes
corresponding to the following document public:
https://docs.google.com/document/d/1mHYQGXE4oM1zgg83MMc4ho1gmoJMeZcq9MWG99WgL3A

Please find the test cases and their results here:
https://github.com/zivanfi/column-indexes-data-interop-tests-excerpts

Best,
Anna



On Mon, Feb 11, 2019 at 4:57 PM Anna Szonyi <szo...@cloudera.com> wrote:

> Hi dev@,
>
> Last week we had a twofer: e2e tool and integration test validating the
> contract of column indexes/indices (if all values are between min and max
> and if set whether the boundary order is correct). There are some takeaways
> and corrections to be made to the former (like the max->min typo) - thanks
> for the feedback on that!
>
> The next installment is also an integration test that tests the filtering
> logic on files including simple and special cases (user defined function,
> complex filtering, no filtering, etc.).
>
>
> https://github.com/apache/parquet-mr/blob/e7db9e20f52c925a207ea62d6dda6dc4e870294e/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestColumnIndexFiltering.java
>
> Please let me know if you have any questions/comments.
>
> Best,
> Anna
>
>
>
>
>

Re: Column index testing break down

Reply via email to