A problem that I keep coming back to is how to allow custom Scorers to work efficiently with custom Index formats. For efficiency, you want to provide direct access to the underlying data rather than requiring multiple function calls per match, but you don't want to have to subclass each Scorer for each Index. Ideally, ou want every custom Scorer to work with every new Index out of the box.
One solution is to come up with a common data format that each Scorer uses, and have the Index capable of producing making that available to the Scorer. I thought this article did a good job of explaining this approach: http://fgiesen.wordpress.com/2011/11/21/buffer-centric-io/ It's essentially what I was envisioning, but also includes some "tricks" that allow for easier error handling. It's not directly applicable to Lucy, but is in C and I thought it might be a good starting point for defining terms and thinking about approaches. --nate
