hi daniel, some remarks/answers follow inline:
On 9/25/05, Daniel Hagen <[EMAIL PROTECTED]> wrote: > Hi, > > I apologize if this is the wrong place to ask my questions but I do not know > where else I should ask. > > I am currently considering the use of Jackrabbit in a future project. > The (very) rough layout I am thinking about is Jboss as Application Server > and Jackrabbit for content storage (equipped with a custom access manager > and login module for authentication & authorization). > > But I am not sure whether Jackrabbit will be able to handle the amount of > data we will have to deal with. > The application might have to handle ~ 2000 - 5000 new documents/day (size > ranging from 2kb to 1 mb, I assume an average of ~50 KB). > Each document will have about 5 - 10 simple text properties and the "binary" > content of the documents (plain text/HTML/MS Word/PDF) will have to be > indexed for a fulltext search. > Read access to the contents will not be very frequent, I am assuming 5 > requests for the mentionened simple properties of a node per minute, 5 > concurrent users, access to binary contents will propably appear once every > minute. > > In short: The application will have to be able to do a fulltext search on > (worst case) more than 10,000,000 contents and will have to handle creation > of new contents without stalling the server. > > What is your opinion, is Jackrabbit the right tool for the task? > Which Persistence Manager would be the best choice? > Are there any special hardware considerations I should think about (e.g. > separating index and storage on separate discs using separate controllers > ...)? > Should we have OS preferences for the server (current options are Windows > 2003 Server vs. Linux with a strong preference towards Windows 2003 Server)? if you're using a filesystem-based pm (e.g. ObjectPersistenceManager on LocalFileSystem) i'd definitely go for linux. the windows filesystem really sucks whith a large number of small files. with the CQFileSystem (custom filesystem in-a-file) you can improve the performance on a windows box considerably but it's not opensource and it's only free for non-commercial use. ObjectPersistenceManager w/LocalFileSystem on a linux box provides imo a decent performance, it's major flaw is that it is non-transactional. there's also a jdbc-based pm in the contrib directory (contrib/db-persistence). it is transactional and, depending on the type of database, provides a very decent performance (e.g. mysql). i suggest you setup your own performance/scalability tests. cheers stefan > > I know that not all of my questions are directly related to Jackrabbit > Development and some will propably not be answered due to a lack of existing > data, but any clues/hints will be greatly appreciated. > > Thank you for your help! > > Daniel > >
