GitHub user huor opened a pull request:

    https://github.com/apache/incubator-hawq/pull/321

    HAWQ-372. Fix single row insert and COPY hang in high concurrent workloads

    Root cause analysis shows that the hang of concurrent workload is in three 
folds:
    
    Most of the queries lock relations at first and then allocate resource for 
it, while analyze (either manually or automatically triggered) allocate 
resource at first and then lock relation. This may lead to deadlock between 
relation lock and query resource with concurrent queries.
    
    Some of the queries may do query resource allocation multiple times, i.e., 
SRI/COPY which triggers automatic statistics collection. They do not return 
query resource as soon as some of the sub tasks are done for the query.
    For example, SRI allocate resource for insert itself, do insertion, return 
query resource for insertion, allocate resource for automatically triggered 
analyze, do analyze, return query resource for analyze; while COPY allocate 
resource for COPY itself, do COPY, allocate resource for automatically 
triggered analyze, do analyze, return query resource for analyze, return query 
resource for COPY. This may lead to a lot of query resource for COPY itself is 
occupied, and they still try to allocate more resource for analyze. Thus, it 
makes some of the COPY pending to allocate resource for analyze, which seems 
like "deadlock" on query resource.
    
    SRI/COPY do query resource allocation multiple times, while TPC-H do 
resource allocation only once. Usually TPC-H queries take longer time to 
complete. Thus, SRI/COPY maybe run in halfway and do second resource allocation 
for some of the sub-tasks. If meanwhile all resource are busy, SRI/COPY need to 
wait for TPC-H queries to return resource and then proceed. As a consequence, 
SRI/COPY run very slow or even hang in user's standpoint.
    
    For address the issue, we do following fix:
    
    For 1, make sure all queries (especially insert queries, create/alter/drop 
database object queries) follows the pattern that lock relation at first, and 
then allocate query resource.
    
    For 2, make sure queries follows the pattern that allocate query resource 
for sub-task1, return query resource for sub-task1, ..., allocate query 
resource for sub-taskN, return query resource for sub-taskN
    
    For 3, from user practice, separate different workloads in different 
resource queues, i.e., SRI/COPY in one load queue, while TPC-H in query queue.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huor/incubator-hawq sri_copy_analyze

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #321
    
----
commit f08b5e6c64387bc6067f1a45ceec1c50b3b1ce8d
Author: Ruilong Huo <[email protected]>
Date:   2016-02-02T10:44:49Z

    HAWQ-372. Fix single row insert and COPY hang in high concurrent workloads

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to