On Mon, Jan 5, 2009 at 7:00 AM, Vlad Cananau vlad...@gmail.com wrote:
Hello
I'm trying to make RSSParser do something simmilar to FeedParser (which
doesn't work quite right) - that is, instead of indexing the whole contents
Why doesn't FeedParser work? Let's fix whatever is broken in it :D
On Mon, Jan 5, 2009 at 12:32 PM, Doğacan Güney doga...@gmail.com wrote:
On Mon, Jan 5, 2009 at 7:00 AM, Vlad Cananau vlad...@gmail.com wrote:
Hello
I'm trying to make RSSParser do something simmilar to FeedParser (which
doesn't work quite right) - that is, instead of indexing the whole
Doğacan Güney wrote:
On Mon, Jan 5, 2009 at 7:00 AM, Vlad Cananau vlad...@gmail.com wrote:
Hello
I'm trying to make RSSParser do something simmilar to FeedParser (which
doesn't work quite right) - that is, instead of indexing the whole contents
Why doesn't FeedParser work? Let's fix
Hello
I'm trying to make RSSParser do something simmilar to FeedParser (which
doesn't work quite right) - that is, instead of indexing the whole contents
of the feed, I want it to show individual items, with their respective title
and and proper link to the article I realize that I could index 1
-fecter-and-index-individul-how-can-i-realize-this-function-tp8722009p20815016.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.
2. It sounds like a pretty fundamental API shift in Nutch, to support a
single type of content, RSS. Even if there are more content types that
follow this model, as Doug and Renaud both pointed out, there aren't a
multitude of them (perhaps archive files, but can you think of any
others)?
Hi Doug,
Okay, I see your points. It seems like this would be really useful for
some current folks, and for Nutch going forward. I see that there has been
some initial work today and preparing patches. I'd be happy to shepherd this
into the sources. I will begin reviewing what's required, and
[mailto:[EMAIL PROTECTED]
Sent: Friday, February 02, 2007 10:19 AM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function
Attention, votre correspondant continue de vous écrire à votre ancienne adresse
en @orange-ft.com, qui va être désactivée début
02, 2007 10:19 AM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function
Attention, votre correspondant continue de vous écrire à votre ancienne adresse
en @orange-ft.com, qui va être désactivée début avril. Veuillez lui demander de
mettre à
.
Doug
--
View this message in context:
http://www.nabble.com/RSS-fecter-and-index-individul-how-can-i-realize-this-function-tf3146271.html#a8876127
Sent from the Nutch - Dev mailing list archive at Nabble.com.
Renaud Richardet wrote:
I see. I was thinking that I could index the feed items without having
to fetch them individually.
Okay, so if Parser#parse returned a MapString,Parse, then the URL for
each parse should be that of its link, since you don't want to fetch
that separately. Right?
So
Guys,
Sorry to be so thick-headed, but could someone explain to me in really
simple language what this change is requesting that is different from the
current Nutch API? I still don't get it, sorry...
Cheers,
Chris
On 2/7/07 9:58 AM, Doug Cutting [EMAIL PROTECTED] wrote:
Renaud Richardet
Doug Cutting wrote:
Renaud Richardet wrote:
I see. I was thinking that I could index the feed items without
having to fetch them individually.
Okay, so if Parser#parse returned a MapString,Parse, then the URL
for each parse should be that of its link, since you don't want to
fetch that
Chris Mattmann wrote:
Sorry to be so thick-headed, but could someone explain to me in really
simple language what this change is requesting that is different from the
current Nutch API? I still don't get it, sorry...
A Content would no longer generate a single Parse. Instead, a Content
Also true. On the other hand, Nutch provides 98% of an RSS search
engine. It'd be a shame to have to re-invent everything else and it
would be great if Nutch could evolve to support RSS well.
Could image search might also benefit from this? One could generate a
Parse for each image on a
Renaud Richardet wrote:
Doug Cutting wrote:
Renaud Richardet wrote:
I see. I was thinking that I could index the feed items without
having to fetch them individually.
Okay, so if Parser#parse returned a MapString,Parse, then the URL
for each parse should be that of its link, since you don't
Hi,
Doug Cutting wrote:
Doğacan Güney wrote:
I think it would make much more sense to change parse plugins to take
content and return Parse[] instead of Parse.
You're right. That does make more sense.
OK, then should I go forward with this and implement something? This
should be pretty
[EMAIL PROTECTED]
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function
Hi,
Doug Cutting wrote:
Doğacan Güney wrote:
I think it would make much more sense to change parse plugins to take
content and return Parse[] instead of Parse
Doğacan Güney wrote:
OK, then should I go forward with this and implement something? This
should be pretty easy,
though I am not sure what to give as keys to a Parse[].
I mean, when getParse returned a single Parse, ParseSegment output them
as url, Parse. But, if getParse
returns an array,
Hi Doug,
Since the target of the link must still be indexed separately from the
item itself, how much use is all this? If the RSS document is
considered a single page that changes frequently, and item's links are
considered ordinary outlinks, isn't much the same effect achieved?
IMHO, yes.
Hi Chris, Doug,
Chris Mattmann wrote:
Hi Doug,
Since the target of the link must still be indexed separately from the
item itself, how much use is all this? If the RSS document is
considered a single page that changes frequently, and item's links are
considered ordinary outlinks, isn't
Renaud Richardet wrote:
The usecase is that you index RSS-feeds, but your users can search each
feed-entry as a single document. Does it makes sense?
But each feed item also contains a link whose content will be indexed
and that's generally a superset of the item. So should there be two
Doug Cutting wrote:
Renaud Richardet wrote:
The usecase is that you index RSS-feeds, but your users can search
each feed-entry as a single document. Does it makes sense?
But each feed item also contains a link whose content will be indexed
and that's generally a superset of the item.
Renaud Richardet wrote:
Doug Cutting wrote:
Renaud Richardet wrote:
The usecase is that you index RSS-feeds, but your users can search
each feed-entry as a single document. Does it makes sense?
But each feed item also contains a link whose content will be indexed
and that's generally a
Doug Cutting wrote:
Gal Nitzan wrote:
IMHO the data that is needed i.e. the data that will be fetched in
the next fetch process is already available in the item element.
Each item element represents one web resource. And there is no
reason to go to the server and re-fetch that resource.
Doğacan Güney wrote:
I think it would make much more sense to change parse plugins to take
content and return Parse[] instead of Parse.
You're right. That does make more sense.
Doug
: Wednesday, January 31, 2007 8:44 AM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this
function
Hi there,
With the explanation that you give below, it seems like parse-rss as
it
exists would address what you are trying to do. parse-rss parses
[mailto:[EMAIL PROTECTED]
Sent: Thursday, February 01, 2007 7:01 PM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function
Hi Gal, et al.,
I'd like to be explicit when we talk about what the issue with the RSS
parsing plugin is here; I think
Gal Nitzan wrote:
IMHO the data that is needed i.e. the data that will be fetched in the next fetch process
is already available in the item element. Each item element represents one
web resource. And there is no reason to go to the server and re-fetch that resource.
Perhaps ProtocolOutput
@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function
Hi there,
With the explanation that you give below, it seems like parse-rss as it
exists would address what you are trying to do. parse-rss parses an RSS
channel as a set of items, and indexes
Mattmann [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 01, 2007 7:01 PM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function
Hi Gal, et al.,
I'd like to be explicit when we talk about what the issue with the RSS
parsing plugin is here; I
AM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this
function
Hi there,
With the explanation that you give below, it seems like parse-rss as
it
exists would address what you are trying to do. parse-rss parses an RSS
channel as a set of items
hi ,
thx any way , but i don't think I tell clearly enough.
what i want is nutch just fetch rss seeds for 1 depth. So nutch should
just fetch some xml pages .I don't want to fetch the items' outlink 's
pages, because there r too much spam in those pages.
so , i just need to parse the rss
]
Sent: Wednesday, January 31, 2007 8:44 AM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function
Hi there,
With the explanation that you give below, it seems like parse-rss as it
exists would address what you are trying to do. parse-rss parses
Hi folks :
What’s I want to do is to separate a rss file into several pages .
Just as what has been discussed before. I want fetch a rss page and index
it as different documents in the index. So the searcher can search the
Item’s info as a individual hit.
What’s my opinion create a
Hi there,
I could most likely be of assistance, if you gave me some more information.
For instance: I'm wondering if the use case you describe below is already
supported by the current RSS parse plugin?
The current RSS parser, parse-rss, does in fact index individual items that
are pointed to
thx for ur reply .
mybe i didn't tell clearly .
I want to index the item as a individual page .then when i search the some
thing for example nutch-open source, the nutch return a hit which contain
title : nutch-open source
description : nutch nutch nutch nutch nutch
url :
: RSS-fecter and index individul-how can i realize this function
Hi there,
I could most likely be of assistance, if you gave me some more information.
For instance: I'm wondering if the use case you describe below is already
supported by the current RSS parse plugin?
The current RSS
-
From: Chris Mattmann [EMAIL PROTECTED]
Date: Tue, 30 Jan 2007 19:34:49
To:nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function
Hi there,
On 1/30/07 7:00 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Chris,
I saw your name associated
Hi there,
With the explanation that you give below, it seems like parse-rss as it
exists would address what you are trying to do. parse-rss parses an RSS
channel as a set of items, and indexes overall metadata about the RSS file,
including parse text, and index data, but it also adds each item
40 matches
Mail list logo