[jira] Created: (NUTCH-441) Thai Analyzer Plugin

2007-02-07 Thread Vee Satayamas (JIRA)
Thai Analyzer Plugin Key: NUTCH-441 URL: https://issues.apache.org/jira/browse/NUTCH-441 Project: Nutch Issue Type: New Feature Components: indexer Reporter: Vee Satayamas This Thai analyzer

[jira] Updated: (NUTCH-441) Thai Analyzer Plugin

2007-02-07 Thread Vee Satayamas (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vee Satayamas updated NUTCH-441: Attachment: nutch-plugin-analysis-th-20070207.patch.gz Thai Analyzer (lib-lucene-analyzers

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Doug Cutting
Renaud Richardet wrote: I see. I was thinking that I could index the feed items without having to fetch them individually. Okay, so if Parser#parse returned a MapString,Parse, then the URL for each parse should be that of its link, since you don't want to fetch that separately. Right? So

How nuch can be used to build a verticalo search engine?

2007-02-07 Thread ahmed ghouzia
I am trying to build a vertical search engine using rule based crawling strategy. I finished this part as a web application and i want to combine with nutch to control and select the URLs to be fetched. Is there any ideas how to do that? - Finding fabulous

How nuch can be used to build a vertical search engine?

2007-02-07 Thread ahmed ghouzia
I am trying to build a vertical search engine using rule based crawling strategy. I finished this part as a web application and i want to combine with nutch to control and select the URLs to be fetched. Is there any ideas how to do that? - We won't tell. Get

[jira] Created: (NUTCH-442) Integrate Solr/Nutch

2007-02-07 Thread rubdabadub (JIRA)
Integrate Solr/Nutch Key: NUTCH-442 URL: https://issues.apache.org/jira/browse/NUTCH-442 Project: Nutch Issue Type: New Feature Environment: Ubuntu linux Reporter: rubdabadub Hi: After trying out

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Chris Mattmann
Guys, Sorry to be so thick-headed, but could someone explain to me in really simple language what this change is requesting that is different from the current Nutch API? I still don't get it, sorry... Cheers, Chris On 2/7/07 9:58 AM, Doug Cutting [EMAIL PROTECTED] wrote: Renaud Richardet

[jira] Created: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-07 Thread Renaud Richardet (JIRA)
allow parsers to return multiple Parse object, this will speed up the rss parser Key: NUTCH-443 URL: https://issues.apache.org/jira/browse/NUTCH-443 Project: Nutch

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Renaud Richardet
Doug Cutting wrote: Renaud Richardet wrote: I see. I was thinking that I could index the feed items without having to fetch them individually. Okay, so if Parser#parse returned a MapString,Parse, then the URL for each parse should be that of its link, since you don't want to fetch that

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Doug Cutting
Chris Mattmann wrote: Sorry to be so thick-headed, but could someone explain to me in really simple language what this change is requesting that is different from the current Nutch API? I still don't get it, sorry... A Content would no longer generate a single Parse. Instead, a Content

NPE while fetching

2007-02-07 Thread Gal Nitzan
Hi, I experience NPE while fetching I use Nutch trunk (a week ago) with Hadoop 0.11.1 java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java: 2392) at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2087)

Re: NPE while fetching

2007-02-07 Thread Sean Dean
This was corrected in Hadoop as per issue HADOOP-917, but I'm thinking some code in Nutch might have to be changed also. I reported this issue (via mailing list) a while ago and I'm glad it was fixed, but I have been purposely staying with revision 495214 of trunk which seems to provide the

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Sami Siren
Also true. On the other hand, Nutch provides 98% of an RSS search engine. It'd be a shame to have to re-invent everything else and it would be great if Nutch could evolve to support RSS well. Could image search might also benefit from this? One could generate a Parse for each image on a

[jira] Updated: (NUTCH-439) Top Level Domains Indexing / Scoring

2007-02-07 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated NUTCH-439: Attachment: tld_plugin_v1.1.patch I have forgotten to unset http.agent.name in the v1.0

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Doğacan Güney
Renaud Richardet wrote: Doug Cutting wrote: Renaud Richardet wrote: I see. I was thinking that I could index the feed items without having to fetch them individually. Okay, so if Parser#parse returned a MapString,Parse, then the URL for each parse should be that of its link, since you don't