Ning Li wrote:
The draft proposal seems to suggest the following (roughly): A dictionary entry is <Term, FilePointer>.
Perhaps this ought to be <Term, TermInfo>, where TermInfo contains a FilePointer and perhaps other information (e.g., frequency data).
A posting entry for a term in a document is <Doc, PostingContent>. Classes which implement PostingFormat decide the format of PostingContent.
Yes.
Is it a good idea to allow PostingFormat to decide whether and how to store posting content in multiple files?
Ideally, yes. The easiest way to do this would be to have separate files in each segment for each PostingFormat. It would be better if different posting formats could share files, but that's harder to coordinate.
Alternately we could force all postings into a single file per segment. That would simplify the APIs, but prohibit certain file formats, like the one Lucene uses currently.
So the ideal solution would permit both different formats to either share a file, or to use their own file(s). Is it worth the complexity this would add to the API? Or should we jettison the notion of multiple posting files per segment?
Doug --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]