[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476550#comment-13476550
 ] 

Carl Steinbach commented on HIVE-2935:
--------------------------------------

@Thejas: One of the problems we encountered with running qfile tests in 
parallel is that many of these tests create temporary tables and indexes, and 
several also modify the underlying default source tables. Both of these issues 
affect the output of the tests, and when you add in concurrency you end up with 
non-deterministic output that is impossible to validate with the diff-based 
verification scheme we have in place today. QFileClient works around this 
problem by running each qfile test in its own DB/Schema. This solves the 
problem for the overwhelming majority of qfile tests that exist in Hive today. 
However, there are several notable drawbacks:

# We can't support tests that create new DB/Schemas, use the 'USE' command to 
switch databases, create catalog objects in separate DB/Schemas using fully 
qualified names, invoke the 'SHOW DATABASES' command, etc. Tests which fall 
into this category include ctas_uses_database_location.q, add_part_exist.q, 
etc. I added most of these tests to the test.beeline.positive.exclude list in 
build.properties, but clearly I missed some, and it also looks like there are 
some tests in that list that need to be removed. I'll update this soon.
# Index tests such as alter_index.q are also affected since the full name of an 
index catalog object is ${table_schema}__${table_name}_${index_name}__. We 
should be able to work around this specific problem by defining a substitution 
property for each test corresponding to the db name. I will file a separate 
subtask for this issue.
# This partitioning scheme allows us to test concurrent DDL/DML commands in 
separate namespaces, but doesn't provide any coverage for running concurrent 
DDL/DML in the same namespace. I don't think it's feasible to do this with the 
current set of qfiles, and propose instead that we create a separate test that 
concurrently runs a carefully selected subset of these qfiles in the same 
namespace.

Most of the tests that you listed above fall into one of these categories. 
Please let me know if you find any that don't and I'll look at them more 
closely.

A separate but related matter is that if we commit this patch with 
TestBeeLineDriver enabled, the overall time to run all tests will roughly 
double from ~4hrs to ~8hrs. My preference is do deprecate TestCliDriver in 
favor of TestBeeLineDriver, but I can't make that decision on my own.

                
> Implement HiveServer2
> ---------------------
>
>                 Key: HIVE-2935
>                 URL: https://issues.apache.org/jira/browse/HIVE-2935
>             Project: Hive
>          Issue Type: New Feature
>          Components: Server Infrastructure
>            Reporter: Carl Steinbach
>            Assignee: Carl Steinbach
>              Labels: HiveServer2
>         Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to