Robert Goene wrote:

Hi,

Thanks for actually reading it and giving a thorough reply!



I think this could be done more general, such that every time a document
is changing is being indexed, e.g. also after editing, whereas there could
be one index for the authoring and one index for the live area.


If the document changes, it will be reindexed. I don't really see the need of a seperate index for every area.


within the authoring area the content can be quite different. Also there can be documents which don't exist within the live area. I think it definitely makes sense to have different indices or rather being able to search on "different versions"
re workflow status.

I don't think it will be much more work to implement, but rather keep the interface general enough and maybe just implement the live area if time it too limited for you.
But I'd suggest that you rather drop some othe features and focuse on this.



I don't think a document should require a schema, but I guess we get into a religious war here. But you can definitely not assume that everything is validated by RelaxNG, because Lenya would close itself badly if it would neglect schemas like XSD and others ...


On the one hand you like the centralized definition of the index, as you propose to add the indexing to the schema and on the other hand, you like to keep the schema requirement as flexible as possible. I see the dilemma and that's why i think my idea is a nice way to keep some sort of flexibility on the schema side, but with a centralized definition in the form of the samplefile.


sorry, I didn't understand that you were talking about a samplefile, but one
thing to think about is probably reoccuring elements and how to handle them.


Changing the fields would require a change of the 'obsolete' xml documents, but i think this is a rare case that should actually be avoided. Fields can be added or fields can become obsolete without a problem, but changing a field is something that is done rarely, if ever. Could you give me a scenario where this would be an urgent problem?


let's say you have one title field, and then one adopts the schema that one
has a maintitle and a subtitle and title will be gone, whereas the title is becoming
the maintitle




Well, this is just a first shot. I will probably change it, but something like this:

<pr>
<title>
<lenya:index>title</lenya:index>
Lenya 14 release preponed
</title>
<content>
<lenya:index>contents</lenya:index>
The release of Lenya 1.4, the Apache Content Management System, ladila
</content>
</pr>


how do you want to mark attributes?




how do you want to treat these external links?


I want to fetch the links in the document parser and let Nutch fetch them when the scheduled index process will run. I am not sur yet if i can feed them to nutch directly or that i should add them to a text file that nutch uses. I will give it another look.


I was actually rather thinking about how to do you want to handle them within the index, because they won't have the same fields as the ones within Lenya. Do you want
to create a separate index?


As far as i can see, it contains all the output one can ask for from a Lucene query.


also pagening?

The nice thing is: it possible to scatter the result in different pages. The links to all pages are delivered with the output. It looks pretty comprehensive to me.

Again, thanks for the reply!


no problem, thanks very much for working on it. Please don't be afraid of my comments (in case you are), but I just want to make sure that various things are being
considered.

Thanks

Michi


Regards, Robert

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
[EMAIL PROTECTED]                        [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to