Hi Daniel,

I'll try to answer some of your questions.

Daniel Hagen wrote:
The application might have to handle ~ 2000 - 5000 new documents/day (size
ranging from 2kb to 1 mb, I assume an average of ~50 KB). Each document will have about 5 - 10 simple text properties and the "binary"
content of the documents (plain text/HTML/MS Word/PDF) will have to be
indexed for a fulltext search.
Read access to the contents will not be very frequent, I am assuming 5
requests for the mentionened simple properties of a node per minute, 5
concurrent users, access to binary contents will propably appear once every
minute.

In short: The application will have to be able to do a fulltext search on
(worst case) more than 10,000,000 contents and will have to handle creation
of new contents without stalling the server.

regarding concurrency, this is not a problem. Jackrabbit is able to handle queries and workspace modifications concurrently.

regarding the volume of content: this more or less depends on how well lucene scales. and it seems that it does quite well. The lucene website probably has some information on this topic.

Are there any special hardware considerations I should think about (e.g.
separating index and storage on separate discs using separate controllers
...)?

this will definitively help increase performance.


regards
 marcel

Reply via email to