The biggest pain point, I suspect is that all of the pages need to be examined at startup time. That's an expensive operation. Similar to how we run the Lucene indexer in a background thread, an easy way to speed up startup would be to initialize the ReferenceManager that way, too.
ReferenceManager actually caches its contents to the workdir (the refmgr.ser), so only the first startup is slow; for the rest we just deserialize the data. Variables are also cached.
However, ACLs are not. And since we do scan through all of the pages, the missing ACLs cause a refresh of the entire page. This would be a relatively low-hanging fruit to grab.
However, that's probably not the whole answer. It seems to me that references (what links to Page X, and what does Page X link to) is also something that should properly be stored as page metadata. So, that's probably part of the solution too. Certainly, part of the plan would be to enable deployers to share page repositories, and references would be part of that.
Yes, it would make sense to store this as the page metadata. However, they're not the only thing which are a problem.
/Janne
