(reposted from the user mailing list)
Andrew Franz wrote:
I am thinking about a simple CMS (Content Management System) which
would have the following features:
1. Ability to list MS-Office files along with their
<SummaryInformation> attributes (this would use Jakarta POI), ability
to list "image" files (basically by cloning the functionality in
ImageDirectoryGenerator) and be able to be extended to other commonly
used document formats such as PDF
2. The output of #1 would be used as input to create a Lucene Index.
3. The Lucene index would be used to search an Intranet by Author,
Title, Subject, etc.
This would mean that content-creators in the organisation would
categorise documents simply by updating <SummaryInformation>
('Properties' in MS-Office applications) and then uploading the file
(the current implementation requires them to update a database
separate to the document itself). The Cocoon application would
automatically categorise the document, either by using Lucene or from
the SummaryInformation. Indexing would only apply to the header/meta
info - full text indexing of content is not required.
The question (to experienced Cocoon developers) is what is the
preferred method of implementation?
Option 1. Extend DirectoryGenerator similar to the way
ImageDirectoryGenerator is implemented but adding new file types
Option 2. Use DirectoryGenerator 'as is' but augment it with a
HeaderGenerator per file/mimetype and then aggregate results such that
the output is similar to #1
Option 3. Tell the users to 'SaveAs' MS-Office documents into an XML
format and use XSLT to extract the summary information. For example
Visio binary format (VSD) can be saved as VXD and the same information
can be extracted via XSLT
All of the above are feasible and invariant to the user-interface so
the question is more about performance.
Has anyone gone down this route? Are there any pitfalls I need to be
aware of? For the experienced Cocoon developers, what is your gut-feel
about which is the preferred option?
Replies much appreciated.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]