Robert Goene wrote:
Hi,
Thanks for actually reading it and giving a thorough reply!
I think this could be done more general, such that every time a document
is changing is being indexed, e.g. also after editing, whereas there
could
be one index for the authoring and one index for the live area.
If the document changes, it will be reindexed. I don't really see the
need of a seperate index for every area.
within the authoring area the content can be quite different. Also there
can be
documents which don't exist within the live area. I think it definitely
makes
sense to have different indices or rather being able to search on
"different versions"
re workflow status.
I don't think it will be much more work to implement, but rather keep
the interface
general enough and maybe just implement the live area if time it too
limited for you.
But I'd suggest that you rather drop some othe features and focuse on this.
I don't think a document should require a schema, but I guess we get
into a religious war here. But you can definitely not assume that
everything is validated by RelaxNG, because Lenya would close itself
badly if it would neglect schemas like XSD and others ...
On the one hand you like the centralized definition of the index, as
you propose to add the indexing to the schema and on the other hand,
you like to keep the schema requirement as flexible as possible. I see
the dilemma and that's why i think my idea is a nice way to keep some
sort of flexibility on the schema side, but with a centralized
definition in the form of the samplefile.
sorry, I didn't understand that you were talking about a samplefile, but one
thing to think about is probably reoccuring elements and how to handle them.
Changing the fields would require a change of the 'obsolete' xml
documents, but i think this is a rare case that should actually be
avoided. Fields can be added or fields can become obsolete without a
problem, but changing a field is something that is done rarely, if
ever. Could you give me a scenario where this would be an urgent problem?
let's say you have one title field, and then one adopts the schema that one
has a maintitle and a subtitle and title will be gone, whereas the title
is becoming
the maintitle
Well, this is just a first shot. I will probably change it, but
something like this:
<pr>
<title>
<lenya:index>title</lenya:index>
Lenya 14 release preponed
</title>
<content>
<lenya:index>contents</lenya:index>
The release of Lenya 1.4, the Apache Content Management System, ladila
</content>
</pr>
how do you want to mark attributes?
how do you want to treat these external links?
I want to fetch the links in the document parser and let Nutch fetch
them when the scheduled index process will run. I am not sur yet if i
can feed them to nutch directly or that i should add them to a text
file that nutch uses. I will give it another look.
I was actually rather thinking about how to do you want to handle them
within the index, because they won't have the same fields as the ones
within Lenya. Do you want
to create a separate index?
As far as i can see, it contains all the output one can ask for from a
Lucene query.
also pagening?
The nice thing is: it possible to scatter the result in different
pages. The links to all pages are delivered with the output. It looks
pretty comprehensive to me.
Again, thanks for the reply!
no problem, thanks very much for working on it. Please don't be afraid
of my comments (in case you are), but I just want to make sure that
various things are being
considered.
Thanks
Michi
Regards, Robert
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Michael Wechner
Wyona - Open Source Content Management - Apache Lenya
http://www.wyona.com http://lenya.apache.org
[EMAIL PROTECTED] [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]