In some cases, though, focused crawling requirements may require
extra data to be stored, which is not useful for whole-web, for
example, storing a url's parent and seed url and its depth
(essential for crawl scopes).
Sounds like meta data for a page. :)
Some time ago I submit a patch to the issue tracking, we use this
meta data here in a project to decide if the page should be crawled
or not.. and to give meta data from a 'mother pager' to a child.
I still believe flexible page meta data would be a big help in many
cases and I believe that to map reduce and 'merge-in' meta data as
Doug suggest it, isn't that powerful, since a identically key for the
page and the meta datum are required.
Just my 2 cents..
Greetings,
Stefan