Thai Analyzer Plugin
Key: NUTCH-441
URL: https://issues.apache.org/jira/browse/NUTCH-441
Project: Nutch
Issue Type: New Feature
Components: indexer
Reporter: Vee Satayamas
This Thai analyzer
[
https://issues.apache.org/jira/browse/NUTCH-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vee Satayamas updated NUTCH-441:
Attachment: nutch-plugin-analysis-th-20070207.patch.gz
Thai Analyzer (lib-lucene-analyzers
Renaud Richardet wrote:
I see. I was thinking that I could index the feed items without having
to fetch them individually.
Okay, so if Parser#parse returned a MapString,Parse, then the URL for
each parse should be that of its link, since you don't want to fetch
that separately. Right?
So
I am trying to build a vertical search engine using rule based crawling
strategy.
I finished this part as a web application and i want to combine with nutch to
control and select the URLs to be fetched. Is there any ideas how to do that?
-
Finding fabulous
I am trying to build a vertical search engine using rule based crawling
strategy.
I finished this part as a web application and i want to combine with nutch to
control and select the URLs to be fetched. Is there any ideas how to do that?
-
We won't tell. Get
Integrate Solr/Nutch
Key: NUTCH-442
URL: https://issues.apache.org/jira/browse/NUTCH-442
Project: Nutch
Issue Type: New Feature
Environment: Ubuntu linux
Reporter: rubdabadub
Hi:
After trying out
Guys,
Sorry to be so thick-headed, but could someone explain to me in really
simple language what this change is requesting that is different from the
current Nutch API? I still don't get it, sorry...
Cheers,
Chris
On 2/7/07 9:58 AM, Doug Cutting [EMAIL PROTECTED] wrote:
Renaud Richardet
allow parsers to return multiple Parse object, this will speed up the rss parser
Key: NUTCH-443
URL: https://issues.apache.org/jira/browse/NUTCH-443
Project: Nutch
Doug Cutting wrote:
Renaud Richardet wrote:
I see. I was thinking that I could index the feed items without
having to fetch them individually.
Okay, so if Parser#parse returned a MapString,Parse, then the URL
for each parse should be that of its link, since you don't want to
fetch that
Chris Mattmann wrote:
Sorry to be so thick-headed, but could someone explain to me in really
simple language what this change is requesting that is different from the
current Nutch API? I still don't get it, sorry...
A Content would no longer generate a single Parse. Instead, a Content
Hi,
I experience NPE while fetching I use Nutch trunk (a week ago) with Hadoop
0.11.1
java.lang.NullPointerException
at
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:
2392)
at
org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2087)
This was corrected in Hadoop as per issue HADOOP-917, but I'm thinking some
code in Nutch might have to be changed also. I reported this issue (via mailing
list) a while ago and I'm glad it was fixed, but I have been purposely staying
with revision 495214 of trunk which seems to provide the
Also true. On the other hand, Nutch provides 98% of an RSS search
engine. It'd be a shame to have to re-invent everything else and it
would be great if Nutch could evolve to support RSS well.
Could image search might also benefit from this? One could generate a
Parse for each image on a
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v1.1.patch
I have forgotten to unset http.agent.name in the v1.0
Renaud Richardet wrote:
Doug Cutting wrote:
Renaud Richardet wrote:
I see. I was thinking that I could index the feed items without
having to fetch them individually.
Okay, so if Parser#parse returned a MapString,Parse, then the URL
for each parse should be that of its link, since you don't
15 matches
Mail list logo