Hi Tika,
(cc Aperture, just fyi)

I stumbled upon
http://wiki.apache.org/tika/MetadataDiscussion
and
http://wiki.apache.org/tika/RecursiveMetadata


The problems don't stop there,
if you think it through you end up with zip-files containing zip-files
containing .pst and email files containing attached word documents
containing embedded excel.

In the sourceforge project "Aperture" (its similar to Tika) the solution
was to use the W3C standard RDF which allows endlessly stacking
information into each other. This was also used in the NEPOMUK-KDE linux
implementation, but there in C++ and with a slightly different angle to it.

it may be useful to check out their documentation and their status of
dicussion:

the data model:
http://www.semanticdesktop.org/ontologies/

this is the specific model of stacking things into each other:
http://www.semanticdesktop.org/ontologies/2007/01/19/nie/

the stacking/recursive problem was solved using "subcrawlers":
http://sourceforge.net/apps/trac/aperture/wiki/SubCrawlers

general structure of things coming together:
http://sourceforge.net/apps/trac/aperture/wiki/GeneralStructure


>From my experience (I am co-author and was initiator of most of the
above) there is only a limited short-term benefit of adopting this
thinking, but a bigger long-term benefit as being compatible with
RDF/W3C will on the long turn make Tika compatible with what happens in
HTML5 and other standardization efforts.
Looking at this stuff could help as a guideline for decisions in Tika.


So - Could anyone please think about it for a minute and add these links
and some ideas how to deal with it to
http://wiki.apache.org/tika/MetadataDiscussion
and
http://wiki.apache.org/tika/RecursiveMetadata
?


best
Leo Sauermann, Dr.
CEO and Founder

p.s.
There used to be a much closer tie between tika and aperture in 2007,
but as Aperture development is kind of finished (its in production now
at some places and fixes only done when needed) it seems communication
between them has lowered a bit. Anyone knows why?


mail: [email protected]
mobile: +43 6991 gnowsis
http://www.gnowsis.com

helping people remember,

so join our newsletter
http://www.gnowsis.com/about/content/newsletter
____________________________________________________

Reply via email to