Doug Cutting wrote:
Gal Nitzan wrote:
IMHO the data that is needed i.e. the data that will be fetched in
the next fetch process is already available in the <item> element.
Each <item> element represents one web resource. And there is no
reason to go to the server and re-fetch that resource.
Perhaps ProtocolOutput should change. The method:
Content getContent();
could be deprecated and replaced with:
Content[] getContents();
This would require changes to the indexing pipeline. I can't think of
any severe complications, but I haven't looked closely.
Since getProtocolOutput is called by Fetcher, fetcher(actually, the
underlying protocol plugin) needs to be aware that we are actually
fetching a rss feed and partially parse it to return an array of Contents.
I think it would make much more sense to change parse plugins to take
content and return Parse[] instead of Parse.
--
Doğacan Güney
Could something like that work?
Doug