Doug Cutting wrote:
Gal Nitzan wrote:
IMHO the data that is needed i.e. the data that will be fetched in the next fetch process is already available in the <item> element. Each <item> element represents one web resource. And there is no reason to go to the server and re-fetch that resource.

Perhaps ProtocolOutput should change.  The method:

  Content getContent();

could be deprecated and replaced with:

  Content[] getContents();

This would require changes to the indexing pipeline. I can't think of any severe complications, but I haven't looked closely.

Since getProtocolOutput is called by Fetcher, fetcher(actually, the underlying protocol plugin) needs to be aware that we are actually fetching a rss feed and partially parse it to return an array of Contents.

I think it would make much more sense to change parse plugins to take content and return Parse[] instead of Parse.

--
Doğacan Güney

Could something like that work?

Doug




Reply via email to