[
https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-664:
---------------------------------------
Fix Version/s: 2.2
> Possibility to update already stored documents.
> -----------------------------------------------
>
> Key: NUTCH-664
> URL: https://issues.apache.org/jira/browse/NUTCH-664
> Project: Nutch
> Issue Type: Wish
> Reporter: Sergey Khilkov
> Priority: Minor
> Fix For: 2.2
>
>
> We have huge index of stored documents. It is high cost procedure to fetch
> page, merge indexes any time we update some information about page. The
> information can be changed 1-3 times per day. At this moment we have to store
> changed info in database, but in this case we have lots of problems with
> sorting, search restricions and so on. Lucene itself allows delete single
> document and add new one into existing index. But there is a problem with
> hadoop... As I understand hadoop filesystem has no possibility to write in
> random positions. But it will be great feature if nutch will be able to
> update created index.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira