[
https://issues.apache.org/jira/browse/METRON-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843657#comment-15843657
]
ASF GitHub Bot commented on METRON-678:
---------------------------------------
Github user cestella commented on the issue:
https://github.com/apache/incubator-metron/pull/428
Testing Plan
* Download the alexa 1m dataset:
```
wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
unzip top-1m.csv.zip
```
* Create a 100k and single entry selection:
```
head -n 100000 top-1m.csv > top-100k.csv
head -n 1 top-1m.csv > top-1.csv
```
* Create an extractor.json for the CSV data by editing `extractor.json` and
pasting in these contents:
```
{
"config" : {
"columns" : {
"domain" : 1,
"rank" : 0
}
,"indicator_column" : "domain"
,"type" : "alexa"
,"separator" : ","
},
"extractor" : "CSV"
}
```
* Verify 100k import with 5 threads:
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase using 5 threads
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-100k.csv -t enrichment -c
t -e ./extractor.json -p 5 -b 128
# count data written and verify it's 100k
echo "count 'enrichment'" | hbase shell
```
* Verify 100k import with 5 threads and a batch of 1000:
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase using 5 threads
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-100k.csv -t enrichment -c
t -e ./extractor.json -p 5 -b 1000
# count data written and verify it's 100k
echo "count 'enrichment'" | hbase shell
```
* Verify 100k import with 1 threads:
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase using 5 threads
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-100k.csv -t enrichment -c
t -e ./extractor.json -p 1 -b 128
# count data written and verify it's 100k
echo "count 'enrichment'" | hbase shell
```
* Verify 1 entry import with 5 threads:
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase using 5 threads
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-1.csv -t enrichment -c t
-e ./extractor.json -p 5 -b 128
# count data written and verify it's 1
echo "count 'enrichment'" | hbase shell
```
* Verify 1 entry import with 1 threads:
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase using 5 threads
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-1.csv -t enrichment -c t
-e ./extractor.json -p 1 -b 128
# count data written and verify it's 1
echo "count 'enrichment'" | hbase shell
```
> Multithread the flat file loader
> --------------------------------
>
> Key: METRON-678
> URL: https://issues.apache.org/jira/browse/METRON-678
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Assignee: Casey Stella
>
> Currently the flat file loader is single threaded in its writing to HBase.
> We could make this a lot faster by multithreading the HBase puts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)