[ 
https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763015#comment-13763015
 ] 

Daniel Ciborowski edited comment on NUTCH-1517 at 9/11/13 4:20 PM:
-------------------------------------------------------------------

git clone https://github.com/apache/nutch
wget 
https://issues.apache.org/jira/secure/attachment/12601469/0023883254_1377197869_indexer-cloudsearch.patch
cd nutch/
git checkout -t origin/branch-1.7
patch -p0 -i ~/0023883254_1377197869_indexer-cloudsearch.patch 
vi conf/nutch-site.xml
ant
cd runtime/local/
mkdir -p urls
echo "http://www.princeton.edu/"; > ./urls/seeds.txt 
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
bin/nutch index crawl/crawldb -linkdb crawl/linkdb crawl/segments/*

the vi step is where I add my crawler name, change solr to cloudsearch and add 
my endpoint url. Tried to do this with sed to replace lines but couldn't figure 
it out. 

Edits based on feedback.


                
      was (Author: djc391):
    git clone https://github.com/apache/nutch
wget 
https://issues.apache.org/jira/secure/attachment/12601469/0023883254_1377197869_indexer-cloudsearch.patch
cd nutch/
git checkout -t origin/branch-1.7
patch -p0 -i ~/0023883254_1377197869_indexer-cloudsearch.patch 
vi conf/nutch-site.xml
ant
cd runtime/local/
mkdir -p urls
echo "http://www.princeton.edu/"; > ./urls/seeds.txt 
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb 
crawl/linkdb crawl/segments/*

the vi step is where I add my crawler name, change solr to cloudsearch and add 
my endpoint url. Tried to do this with sed to replace lines but couldn't figure 
it out. 
                  
> CloudSearch indexer
> -------------------
>
>                 Key: NUTCH-1517
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1517
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>            Reporter: Julien Nioche
>             Fix For: 1.9
>
>         Attachments: 0023883254_1377197869_indexer-cloudsearch.patch
>
>
> Once we have made the indexers pluggable, we should add a plugin for Amazon 
> CloudSearch. See http://aws.amazon.com/cloudsearch/. Apparently it uses a 
> JSON based representation Search Data Format (SDF), which we could reuse for 
> a file based indexer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to