Alan Gates created HIVE-12316:
---------------------------------

             Summary: Improved integration test for Hive
                 Key: HIVE-12316
                 URL: https://issues.apache.org/jira/browse/HIVE-12316
             Project: Hive
          Issue Type: New Feature
          Components: Testing Infrastructure
    Affects Versions: 2.0.0
            Reporter: Alan Gates
            Assignee: Alan Gates


In working with Hive testing I have found there are several issues that are 
causing problems for developers, testers, and users:
* Because Hive has many tunable knobs (file format, security, etc.) we end up 
with tests that cover the same functionality with different permutations of 
these features.
* The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
means we cannot run any of those tests at scale.  The HBase community by 
contrast uses the same test suite locally and on a cluster, and has found that 
this helps them greatly in testing.
* Golden files are a grievous evil.  Test writers are forced to eyeball results 
the first time they run a test and decide whether they look reasonable, which 
is error prone and makes testing at scale impossible.  And changes to one part 
of Hive often end up changing the plan (and the output of explain) thus 
breaking many tests that are not related.  This is particularly an issue for 
people working on the optimizer.  
* The lack of ability to run on a cluster means that when people test Hive at 
scale, they are forced to develop custom frameworks which can't then benefit 
the community.
* There is no easy mechanism to bring user queries into the test suite.

I propose we build a new testing capability with the following requirements:
* One test should be able to run all reasonable permutations (mr/tez/spark, 
orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
run every permutation every time, but that the tester could choose which 
permutation to run.
* The same tests should run locally and on a cluster.  The tests should support 
scaling of input data from Ks to Ts.
* Expected results should be auto-generated whenever possible, and this should 
work with the scaling of inputs.  The dev should be able to provide expected 
results or custom expected result generation in cases where auto-generation 
doesn't make sense.
* Access to the query plan should be available as an API in the tests so that 
golden files of explain output are not required.
* This should run in maven, junit, and java so that developers do not need to 
manage yet another framework.
* It should be possible to simulate user data (based on schema and statistics) 
and quickly incorporate user queries so that tests from user scenarios can be 
quickly incorporated.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to