GitHub user cramja opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/109

    Refectored bulk insertion to the SplitRow store

    The inner loop of the insert algorithm has been changed to reduce function 
calls to only those that are absolutely necessary. Also, we merge copies which 
come from other rowstore source, speeding up insertion time. 
    
    Also adds support for the idea of 'partial inserts'. Partial inserts are 
when you are only inserting a subset of the columns at a time. Partial inserts 
will be used in a later commit.
    
    *Testing*
    Unit tests have been updated. The old bulkInsert tests needed to be 
modified because now we have situations where a block will not be filled up 
completely- only to a threshold value. This reduces the runtime of the costly 
inner loop at the cost of a few tuples.
    
    *Performance*
    I had a [similar PR-100 
open](https://github.com/apache/incubator-quickstep/pull/100) last week. I ran 
TPCH SF100 queries 1-17 with this branch and with the branch from PR-100. They 
performed within a 1% margin of each other so it is safe to say that this 
branch is as fast as the last branch (which was 2x the base). 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cramja/incubator-quickstep 
splitrow_insert_refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #109
    
----
commit 4ce5acf046e0d5fce320efcae7aea648549e98e9
Author: cramja <marc.spehlm...@gmail.com>
Date:   2016-10-05T21:40:30Z

    Refectored bulk insertion to the SplitRow store
    
    The inner loop of the insert algorithm has been changed to reduce
    function calls to only those that are absolutely necessary. Also, we
    merge copies which come from other rowstore source, speeding up
    insertion time.
    
    Also adds support for the idea of 'partial inserts'. Partial
    inserts are when you are only inserting a subset of the columns at a
    time. Partial inserts will be used in a later commit.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to