[
https://issues.apache.org/jira/browse/NUTCH-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1051:
---------------------------------
Attachment: NUTCH-1051-1.4-1.patch
Patch for 1.4. Use the -asEff switch to output the scores in a format suitable
for ExternalFileField. An improvement would be that this switch implies the
-scores switch.
Uses mapred.textoutputformat.separator to get the equals-sign in place.
> Export WebGraph node scores for solr.ExternalFileField
> ------------------------------------------------------
>
> Key: NUTCH-1051
> URL: https://issues.apache.org/jira/browse/NUTCH-1051
> Project: Nutch
> Issue Type: Improvement
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.4
>
> Attachments: NUTCH-1051-1.4-1.patch
>
>
> The current webgraph.NodeDumper dumps a flat <url>\t<float>\n file, which is
> almost exactly what is needed for using ExternalFileField in Solr. This issue
> tracks the option to add to dump it in the proper format. Using EFF we can
> update scores without reindexing millions of documents. There's one caveat,
> Solr won't accept an equals-sign in the key but there's a small patch for
> this in SOLR-2545.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira