[ 
https://issues.apache.org/jira/browse/METRON-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848719#comment-15848719
 ] 

ASF GitHub Bot commented on METRON-682:
---------------------------------------

Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/432
  
    # Testing Plan
    
    ## Preliminaries
    
    * Download the alexa 1m dataset:
    ```
    wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
    unzip top-1m.csv.zip
    ```
    * Stage import files
    ```
    head -n 10000 top-1m.csv > top-10k.csv
    hadoop fs -put top-10k.csv /tmp
    head -n 10000 top-1m.csv | gzip - > top-10k.csv.gz
    head -n 10000 top-1m.csv | zip > top-10k.csv.zip
    ```
    * Create an extractor.json for the CSV data by editing `extractor.json` and 
pasting in these contents:
    ```
    {
      "config" : {
        "columns" : {
           "domain" : 1,
           "rank" : 0
                    }
        ,"indicator_column" : "domain"
        ,"type" : "alexa"
        ,"separator" : ","
                 },
      "extractor" : "CSV"
    }
    ```
    
    ## Import from URL
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase from URL.  This should take approximately 5 or 6 
minutes
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i 
http://s3.amazonaws.com/alexa-static/top-1m.csv.zip -t enrichment -c t -e 
./extractor.json -p 5 -b 128
    # count data written and verify it's 1M
    echo "count 'enrichment'" | hbase shell
    ```
    
    ## Import from local file (non-zipped)
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase 
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv -t enrichment -c 
t -e ./extractor.json -p 5 -b 128
    # count data written and verify it's 10k
    echo "count 'enrichment'" | hbase shell
    ```
    
    ## Import from local file (gzipped)
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase 
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv.gz -t enrichment 
-c t -e ./extractor.json -p 5 -b 128
    # count data written and verify it's 10k
    echo "count 'enrichment'" | hbase shell
    ```
    
    ## Import from local file (zipped)
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase 
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv.zip -t enrichment 
-c t -e ./extractor.json -p 5 -b 128
    # count data written and verify it's 10k
    echo "count 'enrichment'" | hbase shell
    ```
    
    ## Import from HDFS via MR
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase 
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i /tmp/top-10k.csv -t enrichment 
-c t -e ./extractor.json -m MR
    # count data written and verify it's 10k
    echo "count 'enrichment'" | hbase shell
    ```



> Unify and Improve the Flat File Loader
> --------------------------------------
>
>                 Key: METRON-682
>                 URL: https://issues.apache.org/jira/browse/METRON-682
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>
> Currently the flat file loader is deficient in a couple ways:
> * It only supports importing local data despite there being a separate, 
> poorly named, application which supports importing enrichment via MapReduce 
> called threat_intel_loader.sh
> * It does not support local imports from HDFS
> * It does not support local imports from URLs
> * It does not support importing zipped archives locally
> * You cannot import more than one file at once
> This JIRA will:
> * Unify the MapReduce and local imports into one program and allow the user 
> to specify the import mode with a CLI flag
> * Support local imports from HDFS and URLs
> * Support local imports from zipped files
> * Support importing more than one file at once



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to