[jira] [Commented] (METRON-678) Multithread the flat file loader

ASF GitHub Bot (JIRA) Fri, 27 Jan 2017 15:36:59 -0800

    [ 
https://issues.apache.org/jira/browse/METRON-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843657#comment-15843657
 ]


ASF GitHub Bot commented on METRON-678:
---------------------------------------

Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/428
  
    Testing Plan
    
    * Download the alexa 1m dataset:
    ```
    wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
    unzip top-1m.csv.zip
    ```
    * Create a 100k and single entry selection:
    ```
    head -n 100000 top-1m.csv > top-100k.csv
    head -n 1 top-1m.csv > top-1.csv
    ```
    * Create an extractor.json for the CSV data by editing `extractor.json` and 
pasting in these contents:
    ```
    {
      "config" : {
        "columns" : {
           "domain" : 1,
           "rank" : 0
                    }
        ,"indicator_column" : "domain"
        ,"type" : "alexa"
        ,"separator" : ","
                 },
      "extractor" : "CSV"
    }
    ```
    * Verify 100k import with 5 threads:
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase using 5 threads
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-100k.csv -t enrichment -c 
t -e ./extractor.json -p 5 -b 128
    # count data written and verify it's 100k
    echo "count 'enrichment'" | hbase shell
    ```
    * Verify 100k import with 5 threads and a batch of 1000:
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase using 5 threads
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-100k.csv -t enrichment -c 
t -e ./extractor.json -p 5 -b 1000
    # count data written and verify it's 100k
    echo "count 'enrichment'" | hbase shell
    ```
    * Verify 100k import with 1 threads:
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase using 5 threads
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-100k.csv -t enrichment -c 
t -e ./extractor.json -p 1 -b 128
    # count data written and verify it's 100k
    echo "count 'enrichment'" | hbase shell
    ```
    * Verify 1 entry import with 5 threads:
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase using 5 threads
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-1.csv -t enrichment -c t 
-e ./extractor.json -p 5 -b 128
    # count data written and verify it's 1
    echo "count 'enrichment'" | hbase shell
    ```
    * Verify 1 entry import with 1 threads:
    ```
    # truncate hbase
    echo "truncate 'enrichment'" | hbase shell
    # import data into hbase using 5 threads
    /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-1.csv -t enrichment -c t 
-e ./extractor.json -p 1 -b 128
    # count data written and verify it's 1
    echo "count 'enrichment'" | hbase shell
    ```


> Multithread the flat file loader
> --------------------------------
>
>                 Key: METRON-678
>                 URL: https://issues.apache.org/jira/browse/METRON-678
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>
> Currently the flat file loader is single threaded in its writing to HBase.  
> We could make this a lot faster by multithreading the HBase puts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (METRON-678) Multithread the flat file loader

Reply via email to