[ 
https://issues.apache.org/jira/browse/METRON-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15844151#comment-15844151
 ] 

ASF GitHub Bot commented on METRON-678:
---------------------------------------

GitHub user cestella reopened a pull request:

    https://github.com/apache/incubator-metron/pull/428

    METRON-678: Multithread the flat file loader

    Currently the flat file loader is single threaded in its writing to HBase. 
We could make this a lot faster by multithreading the HBase puts.
    
    Executing this on single node vagrant with the following configuration for 
100k 2-column CSV enrichment import:
    * a batch size of 128
    * number of threads varying between 1 and 6
    
    A reasonable speedup was achieved:
    
    | Number of Threads | Time (in seconds) |
    |-------------------|-------------------|
    | 1                 | 91.019            |
    | 2                 | 76.07             |
    | 3                 | 39.974            |
    | 4                 | 35.039            |
    | 5                 | 30.531            |
    | 6                 | 30.559            |
    
    
![chart](https://cloud.githubusercontent.com/assets/540359/22392190/af852618-e4c4-11e6-9c03-a68b66e330ad.png)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron parallel_extractor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #428
    
----
commit 47d814ef95d67738d20ce5dc530ba7b05d418a96
Author: cstella <[email protected]>
Date:   2017-01-27T23:15:44Z

    Multithreading the SimpleEnrichmentFlatFileLoader

commit 918d4ce4aea5d7dfde992f32bf049c70f35dd182
Author: cstella <[email protected]>
Date:   2017-01-27T23:23:19Z

    doc changes.

commit c6ca3a86881eb77bc9598a61e3c0cf8280ccb03f
Author: cstella <[email protected]>
Date:   2017-01-27T23:39:56Z

    Updating docs.

commit 8c9a79cdfa38ea2fbd161095d5e346147558ec5f
Author: cstella <[email protected]>
Date:   2017-01-28T03:36:31Z

    Investigating integration tests.

commit 315bd181aa634290ab987441d81c28addb7952e2
Author: cstella <[email protected]>
Date:   2017-01-28T04:09:28Z

    Update integration test to be a proper integration test.

commit 004c6f41b6c1cc3ecea70513e1a468501bd32e3c
Author: cstella <[email protected]>
Date:   2017-01-28T04:49:37Z

    Adding spliterator unit test for completeness

commit f8dd48ef920c948e1fc5ff736e386f641e551b2b
Author: cstella <[email protected]>
Date:   2017-01-28T05:01:42Z

    Updating test to use a proper file

commit 9b04f9723d442c8f4fb7a8bcaa1d733fc1305dc4
Author: cstella <[email protected]>
Date:   2017-01-28T05:17:12Z

    Updating docs and renaming a few things.

commit eb5b82cc35bd767a169f548ea8144dd9ae165f84
Author: cstella <[email protected]>
Date:   2017-01-28T05:23:25Z

    Update one more test case.

----


> Multithread the flat file loader
> --------------------------------
>
>                 Key: METRON-678
>                 URL: https://issues.apache.org/jira/browse/METRON-678
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>
> Currently the flat file loader is single threaded in its writing to HBase.  
> We could make this a lot faster by multithreading the HBase puts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to