Synonym-Editor that creates OWL for the ontology plugin
---
Key: NUTCH-435
URL: https://issues.apache.org/jira/browse/NUTCH-435
Project: Nutch
Issue Type: New Feature
Reporter:
Incorrect handling of relative paths when the embedded URL path is empty
Key: NUTCH-436
URL: https://issues.apache.org/jira/browse/NUTCH-436
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Urs Krebs updated NUTCH-435:
Attachment: SynonymEditor-0.9.zip
Synonym-Editor that creates OWL for the ontology plugin
[
https://issues.apache.org/jira/browse/NUTCH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Groh updated NUTCH-436:
--
Description:
If you have a base URL of the form:
http://a/b/c/d;p?q#f
Embedded URL: ?y
Correct
Hi Kauu,
The functionality you require doesn't exist in the current parse-rss plugin. I
need the same functionality but it doesn't exist and I believe it's not a
simple task.
The functionality required basically is to create a page in a segment for each
item and the URL to the crawldb.
Since
Trying to mergesegs I get the following, any idea?
A record version mismatch occured. Expecting v4, found v5
at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:147)
at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1
175)
at
[
https://issues.apache.org/jira/browse/NUTCH-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann reassigned NUTCH-431:
---
Assignee: Chris A. Mattmann
Move plugin specific properties out of nutch-site.xml
[
https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467887
]
Chris A. Mattmann commented on NUTCH-258:
-
Guys,
From recent conversations on the mailing list where Doug
Gal Nitzan wrote:
Got it. I used latest trunk for a few hours and it seems that it changed the
version of Crawldatum to ver. 5 :(
yes, version is updated on write
Gal Nitzan wrote:
Got it. I used latest trunk for a few hours and it seems that it changed the
version of Crawldatum to ver. 5 :(
Earlier one left too early, one(ore more) of your segments has data
written with newer version. If you haven't updated crawldb then you just
need to redo that(those)
Thanks Sami,
By redo do you mean re-parse or re-fetch + re-parse
-Original Message-
From: Sami Siren [mailto:[EMAIL PROTECTED]
Sent: Friday, January 26, 2007 10:49 PM
To: nutch-dev@lucene.apache.org
Subject: Re: record version mismatch occured
Gal Nitzan wrote:
Got it. I used latest
[
https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467912
]
Sami Siren commented on NUTCH-434:
--
It's only half way if we get the Configuration into our subclass, there's no
Gal Nitzan wrote:
Thanks Sami,
By redo do you mean re-parse or re-fetch + re-parse
generate - fetch - parse
--
Sami Siren
[
https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467916
]
Sami Siren commented on NUTCH-258:
--
I haven't noticed this being a problem for me, so no objections from here.
Just installed latest from trunk.
I run mergesegs and I get the following error in all tasks log files (I use
default log4j.properties):
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: / (Is a directory)
at java.io.FileOutputStream.openAppend(Native Method)
[
https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467927
]
Sami Siren commented on NUTCH-434:
--
I can see the light, overriding readFields is sufficient.
Replace usage of
[
https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467931
]
Scott Ganyo commented on NUTCH-258:
---
Chris,
I originally opened the issue... but unfortunately I can neither
I would like to use Nutch-251(administration gui). How stable is it? How easy
it is to setup and make it work with nutch? Since it seems to work with the
trunk version, how stable is trunk version of nutch? Any tentative schedule,
when nutch next release will be? Will it include admin gui?
I
that's the right thing.
i think we should to do some thing when nutch fetch a page successfully,
judge if a rss then create as many pages as the items' number.i don't know
whether it work.
In the other hand , we can do some thing in the segment just like what u say
.
i don't know that
who can tell me where and how to build a nutch document in nutch-0.8.1?
for example , one html page is a document , but i want to detach a document
to several ones .
On 1/27/07, kauu [EMAIL PROTECTED] wrote:
that's the right thing.
i think we should to do some thing when nutch fetch a page
On 1/26/07, Gal Nitzan [EMAIL PROTECTED] wrote:
Hi Kauu,
The functionality you require doesn't exist in the current parse-rss
plugin. I need the same functionality but it doesn't exist and I believe
it's not a simple task.
The functionality required basically is to create a page in a segment
that's right ,but in the other word , i just need to index the exact
information in a page .but in real ,the real world pages contain lots of
spam ,so i just want to index the description.
On 1/27/07, sishen [EMAIL PROTECTED] wrote:
On 1/26/07, Gal Nitzan [EMAIL PROTECTED] wrote:
Hi Kauu,
22 matches
Mail list logo