Hi,

On 8/19/06, Sami Siren <[EMAIL PROTECTED]> wrote:
So far nutch has been build to deal mainly with text type documents.
There's however need also to deal with non textual object eg.  images,
movies, sound which will provide content only in form of metadata (ok,
perhaps some text also about the context of object if applicable), so
the metadata names we have today are only a subset of what might be.

I really would not want to restrict the metadata the interface can carry
to a fixed set.

But if it's an open Map, how do you index and search using that, i.e.
what is the mapping between the Map keys used by a parser component
and the field names in the resulting Lucene index? How do we enforce
that an MPEG parser uses the same Map keys as a JPEG parser when
encountering metadata with the same semantics?

I'm not opposed to using a Map for truly variable metadata, like HTML
<meta/> tags with unknown names, but if we want common handling for
example for Dublin Core metadata, it would be better to enforce that
on the interface level.

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting, and Java development

Reply via email to