[
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jesus Camacho Rodriguez updated HIVE-12316:
-------------------------------------------
Target Version/s: 2.2.0 (was: 2.1.0)
> Improved integration test for Hive
> ----------------------------------
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
> Issue Type: New Feature
> Components: Testing Infrastructure
> Affects Versions: 2.0.0
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up
> with tests that cover the same functionality with different permutations of
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster. This
> means we cannot run any of those tests at scale. The HBase community by
> contrast uses the same test suite locally and on a cluster, and has found
> that this helps them greatly in testing.
> * Golden files are a grievous evil. Test writers are forced to eyeball
> results the first time they run a test and decide whether they look
> reasonable, which is error prone and makes testing at scale impossible. And
> changes to one part of Hive often end up changing the plan (and the output of
> explain) thus breaking many tests that are not related. This is particularly
> an issue for people working on the optimizer.
> * The lack of ability to run on a cluster means that when people test Hive at
> scale, they are forced to develop custom frameworks which can't then benefit
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark,
> orc/parquet/text/rcfile, secure/non-secure etc.) This doesn't mean it would
> run every permutation every time, but that the tester could choose which
> permutation to run.
> * The same tests should run locally and on a cluster. The tests should
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this
> should work with the scaling of inputs. The dev should be able to provide
> expected results or custom expected result generation in cases where
> auto-generation doesn't make sense.
> * Access to the query plan should be available as an API in the tests so that
> golden files of explain output are not required.
> * This should run in maven, junit, and java so that developers do not need to
> manage yet another framework.
> * It should be possible to simulate user data (based on schema and
> statistics) and quickly incorporate user queries so that tests from user
> scenarios can be quickly incorporated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)