Hi, Is it possible to crawl three different website like
1. https://www.urgenthomework.com/
2. https://www.myassignmenthelp.net/
3. https://www.assignmenthelp.net/
in single nutch configuration and then send the respective index pages to
corrosponding cores [ uah, mah , yah] in solr. I tried to acheieve it by
exchange and writer id. Please look below for my confirgurations
-------------exchange.xml---------------------------------
*<exchange id="uahIndexernew" class="default"> <writers> <writer
id="indexer_solr_1" /> </writers> <params> <param name="expr"
value="doc.getFieldValue('host')=='urgenthomework.com
<http://urgenthomework.com>'" /> </params> </exchange>*
*<exchange id="mahIndexernew" class="default"> <writers> <writer
id="indexer_solr_2" /> </writers> <params> <param name="expr"
value="doc.getFieldValue('host')=='myassignmenthelp.net
<http://myassignmenthelp.net>'" /> </params> </exchange>*
* <exchange id="yahIndexernew" class="default"> <writers> <writer
id="indexer_solr_3" /> </writers> <params> <param name="expr"
value="doc.getFieldValue('host')=='assignmenthelp.net
<http://assignmenthelp.net>'" /> </params> </exchange>*
---------------------------------index.writers.xml----------------------------------------
<writer id="indexer_solr_1"
class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
<parameters>
<param name="type" value="http" />
<param name="url" value="http://localhost:8983/solr/uah" />
<param name="collection" value="" />
<param name="weight.field" value="" />
<param name="commitSize" value="1000" />
<param name="auth" value="false" />
<param name="username" value="username" />
<param name="password" value="password" />
</parameters>
<mapping>
<copy>
<!-- <field source="title" dest="content" />
<field source="metatag.description" dest="content" />
<field source="metatag.keywords" dest="content" /> -->
</copy>
<rename></rename>
<remove>
<field source="segment" />
<field source="host" />
<field source="url" />
<!-- <field source="metatag.description" />
<field source="metatag.keywords" />
<field source="date" />
<field source="url" />
-->
</remove>
</mapping>
</writer>
<writer id="indexer_solr_2"
class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
<parameters>
<param name="type" value="http" />
<param name="url" value="http://localhost:8983/solr/mah" />
<param name="collection" value="" />
<param name="weight.field" value="" />
<param name="commitSize" value="1000" />
<param name="auth" value="false" />
<param name="username" value="username" />
<param name="password" value="password" />
</parameters>
<mapping>
<copy>
</copy>
<rename></rename>
<remove>
<field source="segment" />
<field source="host" />
<field source="url" />
</remove>
</mapping>
</writer>
<writer id="indexer_solr_3"
class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
<parameters>
<param name="type" value="http" />
<param name="url" value="http://localhost:8983/solr/yah" />
<param name="collection" value="" />
<param name="weight.field" value="" />
<param name="commitSize" value="1000" />
<param name="auth" value="false" />
<param name="username" value="username" />
<param name="password" value="password" />
</parameters>
<mapping>
<copy>
</copy>
<rename></rename>
<remove>
<field source="segment" />
<field source="host" />
<field source="url" />
</remove>
</mapping>
</writer>
---------------------------------------------------------------------------------------------------------------
But it is not pushing data into corrosinding cores rather it is sending
data in one core from different domain, Please do let me know. I am sure
there has to be way to achieve it. I didnt try wth sobcollecion.xml. Do you
think I can achieve it using subcollection?