[
https://issues.apache.org/jira/browse/METRON-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843645#comment-15843645
]
ASF GitHub Bot commented on METRON-678:
---------------------------------------
GitHub user cestella opened a pull request:
https://github.com/apache/incubator-metron/pull/428
METRON-678: Multithread the flat file loader
Currently the flat file loader is single threaded in its writing to HBase.
We could make this a lot faster by multithreading the HBase puts.
Executing this on single node vagrant with a batch size of 128 and a number
of threads varying between 1 and 6 for a 2 column CSV enrichment, a reasonable
speedup was achieved:
1. 91.019 seconds
2. 76.07 seconds
3. 39.974 seconds
4. 35.039 seconds
5. 30.531 seconds
6. 30.559 seconds

You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cestella/incubator-metron parallel_extractor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/428.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #428
----
commit 47d814ef95d67738d20ce5dc530ba7b05d418a96
Author: cstella <[email protected]>
Date: 2017-01-27T23:15:44Z
Multithreading the SimpleEnrichmentFlatFileLoader
commit 918d4ce4aea5d7dfde992f32bf049c70f35dd182
Author: cstella <[email protected]>
Date: 2017-01-27T23:23:19Z
doc changes.
----
> Multithread the flat file loader
> --------------------------------
>
> Key: METRON-678
> URL: https://issues.apache.org/jira/browse/METRON-678
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Assignee: Casey Stella
>
> Currently the flat file loader is single threaded in its writing to HBase.
> We could make this a lot faster by multithreading the HBase puts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)