Meta data based link manager

Andreas Hartmann Fri, 04 Apr 2008 12:31:42 -0700

Hi Lenya devs,

currently, the only available LinkManager implementation is theContentLinkManager. When you ask it to return all links that point to aparticular document, it parses all (!) other documents in the same areaand extracts the links based on the link XPaths of the resource type. Asyou can imagine that can take a while, especially in large publications.I expericened this with the docu publication. If you want to decativatea page, you can fetch a coffee in the meantime, and even drink it (atleast if it's an espresso).

In a discussion on the Jackrabbit mailing list, Bertrand Delacretazsuggested to extract all links that are contained in a document beforesaving it, and store them in the meta data. Now, since all Lenya metadata are indexed, this link list can be used for a Lucene search. Thequery looks like this (special characters have to be escaped):


\{http\://apache.org/lenya/metadata/link/1.0\}outgoingLinks:lenya\-document\:1aca68c0\-0243\-11dd\-881a\-f3cc793eb58e\*

The name of the meta data field is

  {http://apache.org/lenya/metadata/link/1.0}outgoingLinks

The term value is

  lenya-document:1aca68c0-0243-11dd-881a-f3cc793eb58e*

Note the wildcard at the end of the term value. It includes URLs with anattached language or publication parameter. The link manager uses somepost-search checks to verify that only the actually linked documents arelisted (conforming to the declared LinkResolver implementation).

With the MetaDataLinkManager, the deactivate screen appears virtuallyimmediately. I guess this scales nicely with large number of documents(as good as Lucene scales). If you have an index spanning multiplepublications, you can even detect links from other publications.

I have the MetaDataLinkManager in my local sandbox. It depends on thesearch API which I have posted on the user list. If you want to take acloser look at the classes, I can upload a ZIP somewhere, or maybe I canextract a patch.

Replacing the ContentLinkManager with the MetaDataLinkManager wouldrequire to "touch" all documents so that the links are extracted (theyare indexed automatically when the session is committed).


Is anybody interested in this feature?

-- Andreas



--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Meta data based link manager

Reply via email to