On 9 Feb 2010, at 15:55, Jukka Zitting wrote: > Hi, > > Now that Jackrabbit 2.0 is out and the major JCR 2.0 feature work is > done, it's time to start looking ahead at Jackrabbit 3. We've talked > about this a bit already at Day and I'll be posting a summary of our > ideas for further discussion, but before that I'd like to frame the > discussion by getting a better picture of the range of requirements > we'll be having for Jackrabbit 3. > > So, please let us know what you expect your repositories to look like > within the next five or so years. I'm especially interested in answers > to the following questions: > > Scalability: > * How much content (number of documents/nodes, raw amount data in > GB/TB/PB) do you have in the repository?
At the moment upto 10s of TB, In the future perhaps PB range. > * How many (concurrent) users (readers/editors/administrators) does > your repository have? Depends on definition of concurrency. Number of users currently expected to be <4M in one installation. In any one hour typically 100K active. All potential writers to the underlying JCR, but mostly reading (80-90% of requests) > * Do you need Internet-scale (millions of users or exabytes of > content) features? no > > Deployment: > * Do you run the repository on a single server, on a cluster or in the cloud? cluster, but would prefer cloud like, need a better PM and ClusterNode than in current JR16/JR2 (need to check JR2 in more detail) > * How many and how powerful servers do you use for the repository? depends on each individual deployment. > > Content model: > * Do you need support for flat content hierarchies (>>10k sibling nodes)? trying to avoid that, but under a lot of pressure to support. > * Do you need support for same-name siblings? no > * If you use versioning, how actively (commit on all saves / commit > only at major milestones) and for what purpose (revision history, > backup, etc.) do you use it? yes, but only on demand. > * How granular (hierarchies of small properties vs. big binary blobs) > is your content? user generated content is all properties, uploads all blobs, typically > 64K > * How much of your content access is based on search / tree traversal > / following references? search 50% tree 45% references < 5% (avoiding strong refs, ie uuid in string, or the path) > * How much you rely on the repository to enforce your content model > (node type constraints, etc.)? not at all. > * How often you modify your content model (and/or related node types)? occasionally, 90% unstructured. > > Features: > * Do you need full ACID semantics? no, very rarely and if we do we put specific protocols in place. > Is an "eventually consistent" > system good enough for you? yes > * Do you need more powerful search features than what we now have? no > * How important is observation to your application? Do you need > trigger-like capability that can modify or reject a save() operation? Not important for in JCR operations, but need async notification of changes. > > Feel free to answer either based on your current usage patterns or to > predict your needs for the next few years. The further ahead in the > future you can reasonably predict, the better. > > Note that I intentionally restricted this set of questions to core > repository features, I'll do a poll on favorite new features later on. > > BR, > > Jukka Zitting