[ 
https://issues.apache.org/jira/browse/METRON-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843645#comment-15843645
 ] 

ASF GitHub Bot commented on METRON-678:
---------------------------------------

GitHub user cestella opened a pull request:

    https://github.com/apache/incubator-metron/pull/428

    METRON-678: Multithread the flat file loader

    Currently the flat file loader is single threaded in its writing to HBase. 
We could make this a lot faster by multithreading the HBase puts.
    
    Executing this on single node vagrant with a batch size of 128 and a number 
of threads varying between 1 and 6 for a 2 column CSV enrichment, a reasonable 
speedup was achieved:
    
    1. 91.019 seconds
    2. 76.07 seconds
    3. 39.974 seconds
    4. 35.039 seconds
    5. 30.531 seconds
    6. 30.559 seconds
    
    
![chart](https://cloud.githubusercontent.com/assets/540359/22391274/507f08a6-e4be-11e6-9f9a-19604791cf37.png)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron parallel_extractor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #428
    
----
commit 47d814ef95d67738d20ce5dc530ba7b05d418a96
Author: cstella <[email protected]>
Date:   2017-01-27T23:15:44Z

    Multithreading the SimpleEnrichmentFlatFileLoader

commit 918d4ce4aea5d7dfde992f32bf049c70f35dd182
Author: cstella <[email protected]>
Date:   2017-01-27T23:23:19Z

    doc changes.

----


> Multithread the flat file loader
> --------------------------------
>
>                 Key: METRON-678
>                 URL: https://issues.apache.org/jira/browse/METRON-678
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>
> Currently the flat file loader is single threaded in its writing to HBase.  
> We could make this a lot faster by multithreading the HBase puts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to