[
https://issues.apache.org/jira/browse/METRON-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15844151#comment-15844151
]
ASF GitHub Bot commented on METRON-678:
---------------------------------------
GitHub user cestella reopened a pull request:
https://github.com/apache/incubator-metron/pull/428
METRON-678: Multithread the flat file loader
Currently the flat file loader is single threaded in its writing to HBase.
We could make this a lot faster by multithreading the HBase puts.
Executing this on single node vagrant with the following configuration for
100k 2-column CSV enrichment import:
* a batch size of 128
* number of threads varying between 1 and 6
A reasonable speedup was achieved:
| Number of Threads | Time (in seconds) |
|-------------------|-------------------|
| 1 | 91.019 |
| 2 | 76.07 |
| 3 | 39.974 |
| 4 | 35.039 |
| 5 | 30.531 |
| 6 | 30.559 |

You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cestella/incubator-metron parallel_extractor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/428.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #428
----
commit 47d814ef95d67738d20ce5dc530ba7b05d418a96
Author: cstella <[email protected]>
Date: 2017-01-27T23:15:44Z
Multithreading the SimpleEnrichmentFlatFileLoader
commit 918d4ce4aea5d7dfde992f32bf049c70f35dd182
Author: cstella <[email protected]>
Date: 2017-01-27T23:23:19Z
doc changes.
commit c6ca3a86881eb77bc9598a61e3c0cf8280ccb03f
Author: cstella <[email protected]>
Date: 2017-01-27T23:39:56Z
Updating docs.
commit 8c9a79cdfa38ea2fbd161095d5e346147558ec5f
Author: cstella <[email protected]>
Date: 2017-01-28T03:36:31Z
Investigating integration tests.
commit 315bd181aa634290ab987441d81c28addb7952e2
Author: cstella <[email protected]>
Date: 2017-01-28T04:09:28Z
Update integration test to be a proper integration test.
commit 004c6f41b6c1cc3ecea70513e1a468501bd32e3c
Author: cstella <[email protected]>
Date: 2017-01-28T04:49:37Z
Adding spliterator unit test for completeness
commit f8dd48ef920c948e1fc5ff736e386f641e551b2b
Author: cstella <[email protected]>
Date: 2017-01-28T05:01:42Z
Updating test to use a proper file
commit 9b04f9723d442c8f4fb7a8bcaa1d733fc1305dc4
Author: cstella <[email protected]>
Date: 2017-01-28T05:17:12Z
Updating docs and renaming a few things.
commit eb5b82cc35bd767a169f548ea8144dd9ae165f84
Author: cstella <[email protected]>
Date: 2017-01-28T05:23:25Z
Update one more test case.
----
> Multithread the flat file loader
> --------------------------------
>
> Key: METRON-678
> URL: https://issues.apache.org/jira/browse/METRON-678
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Assignee: Casey Stella
>
> Currently the flat file loader is single threaded in its writing to HBase.
> We could make this a lot faster by multithreading the HBase puts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)