[
https://issues.apache.org/jira/browse/METRON-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848719#comment-15848719
]
ASF GitHub Bot commented on METRON-682:
---------------------------------------
Github user cestella commented on the issue:
https://github.com/apache/incubator-metron/pull/432
# Testing Plan
## Preliminaries
* Download the alexa 1m dataset:
```
wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
unzip top-1m.csv.zip
```
* Stage import files
```
head -n 10000 top-1m.csv > top-10k.csv
hadoop fs -put top-10k.csv /tmp
head -n 10000 top-1m.csv | gzip - > top-10k.csv.gz
head -n 10000 top-1m.csv | zip > top-10k.csv.zip
```
* Create an extractor.json for the CSV data by editing `extractor.json` and
pasting in these contents:
```
{
"config" : {
"columns" : {
"domain" : 1,
"rank" : 0
}
,"indicator_column" : "domain"
,"type" : "alexa"
,"separator" : ","
},
"extractor" : "CSV"
}
```
## Import from URL
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase from URL. This should take approximately 5 or 6
minutes
/usr/metron/0.3.0/bin/flatfile_loader.sh -i
http://s3.amazonaws.com/alexa-static/top-1m.csv.zip -t enrichment -c t -e
./extractor.json -p 5 -b 128
# count data written and verify it's 1M
echo "count 'enrichment'" | hbase shell
```
## Import from local file (non-zipped)
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv -t enrichment -c
t -e ./extractor.json -p 5 -b 128
# count data written and verify it's 10k
echo "count 'enrichment'" | hbase shell
```
## Import from local file (gzipped)
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv.gz -t enrichment
-c t -e ./extractor.json -p 5 -b 128
# count data written and verify it's 10k
echo "count 'enrichment'" | hbase shell
```
## Import from local file (zipped)
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv.zip -t enrichment
-c t -e ./extractor.json -p 5 -b 128
# count data written and verify it's 10k
echo "count 'enrichment'" | hbase shell
```
## Import from HDFS via MR
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase
/usr/metron/0.3.0/bin/flatfile_loader.sh -i /tmp/top-10k.csv -t enrichment
-c t -e ./extractor.json -m MR
# count data written and verify it's 10k
echo "count 'enrichment'" | hbase shell
```
> Unify and Improve the Flat File Loader
> --------------------------------------
>
> Key: METRON-682
> URL: https://issues.apache.org/jira/browse/METRON-682
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
>
> Currently the flat file loader is deficient in a couple ways:
> * It only supports importing local data despite there being a separate,
> poorly named, application which supports importing enrichment via MapReduce
> called threat_intel_loader.sh
> * It does not support local imports from HDFS
> * It does not support local imports from URLs
> * It does not support importing zipped archives locally
> * You cannot import more than one file at once
> This JIRA will:
> * Unify the MapReduce and local imports into one program and allow the user
> to specify the import mode with a CLI flag
> * Support local imports from HDFS and URLs
> * Support local imports from zipped files
> * Support importing more than one file at once
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)