G3. Index the latest version of the docs, including structured fields (keywords, target audience, components mentioned, etc), to implement "prepared queries" (as links, simply) to improve our docs' accessibility

It seems a good occasion to provide a better LuceneTransformer implementation for cocoon ?


T2. Build an index with Lucene, triggered via SVN post-commit hooks, uses a live Cocoon instance to generate an easy to index XML document for Lucene. Include metadata fields as mentioned in G2 above, generated from (enhanced as compared to now) document content

I'm working on that (nto enough), it seems to me only a consequence of upper.


<map:match pattern="**">
  <map:generate src="{folder}{1}"/>
  <map:transform src="myschema2lucene.xsl"/>
  <map:transform type="lucene"/>
  <map:transform src="myschema2html.xsl"/>
  <map:serialize/>
</map:match>

myschema2lucene.xsl handle the original doc, let everything pass but add something for indexation

<root>
  <!-- the doc to index -->
  <lucene:document>
    <lucene:field name="uri">
<!-- ... -->
    </lucene:field>
    <lucene:field name="fulltext" store="false">
<!-- result of myschema2txt.xsl -->
    </lucene:field>
    <lucene:field name="keyword" tokenize="false">
<!-- keep a field with not tokenized keywords to have them as lists -->
    </lucene:field>
<!-- ... -->
  <lucene:document>
  <!-- the structured doc -->
</root>

The Lucene transformer handle <lucene:*/> and let other things go for publish.

If [EMAIL PROTECTED] haven't changed, should be cached, so not too much transform and indexation, if not, index is update.

For delete, a hook from SVN is needed.

T4. Use queries like "find all documents which talk about sitemap matchers" to build navigation pages semi-automatically.

After some experience of cocoon with lucene, don't forget list of terms (from untokenized fields), because it allows you to have the list of existing keywords (for example), so that you can generate your queries on what you have in your docs (instead of constraints on vocabularies for production).


T5. Put mod_cache in front to minimize server load (HTTP POST can be used to invalidate pages if quick updates are needed to check edits).

You give me the trick for something that I was asking to Sylvain, a kind of pure cocoon mod_cache with <map:act type="copy-source"/>


http://marc.theaimsgroup.com/?l=xml-cocoon-users&m=109636070505876&w=2

problem was update. The cocoon app produces a regular www folder directly served for public by an httpd, update could be a consequence of SVN hook, and il nothing works, there's still the handling of HTTP POST to force update from www.

Fr�d�ric.

--
Fr�d�ric Glorieux (ing�nieur documentaire, AJLSM)
<http://www.ajlsm.com>

Reply via email to