change file format documentation from "bit-for-bit" to highlevel
----------------------------------------------------------------
Key: LUCENE-2946
URL: https://issues.apache.org/jira/browse/LUCENE-2946
Project: Lucene - Java
Issue Type: Task
Components: Website
Reporter: Robert Muir
Fix For: 4.0
While reviewing website docs in LUCENE-2924,
I noticed the the existing fileformats is going to be pretty hopeless for 4.0.
Before it described the format "bit-for-bit", but with flexible indexing this
is
somewhat silly (and who really wants a bit-for-bit explanation of some of the
new formats!)
I think it would be much better to give a high-level overview, perhaps linking
to javadocs or
even source code for the low-level details.
We probably should delay this until 4.0 is really close in sight (since things
are changing so fast) but we can go ahead and think about it some now.
For example:
* high level explanation of what a codec is, and the various subsystems one is
usually composed of (terms index, terms data, skiplist impl, postings impl,
etc). We can reiterate that you can make your own, and hopefully this kind of
documentation will actually encourage that.
* high level explanation of what StandardCodec is "composed of". For example
assume its Variable Terms Index, Block Terms Reader, PForDelta docs and freqs
and Simple64 positions. I think really this is the only codec we should try to
"diagram" in any way.
* high level explanation (probably with links) of some of the components. For
example we could explain what the purpose of a Terms Index is, and that this
implementation uses a finite state transducer to find the terms block for a
given term. In this case maybe we have an image now that Dawid made the toDot
useful.
* high level explanation (probably with links) of some of the compression
algorithms. For example, we could explain the basics of the available
algorithms we have (vbyte/simple/for/pfor/...) and what their advantages and
disadvantages are.
Some of the things i mentioned here are probably optional, for instance I think
its "enough" to give a high-level overview of StandardCodec, but I can't help
but think that explaining some of the architecture will be useful for new
developers.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]