Hello! Answers below...: On Wed, Mar 21, 2012 at 11:03 AM, Han Jiang <[email protected]> wrote: > Hi All, > > I'm Billy, a senior undergraduate student in Peking University. I'm working > in the area of Information Retrieval and Web Mining. When going through the > idea list, I felt quite interested in the LUCENE-3892 and LUCENE-3069. I am > very proficient on java, and have been using lucene for about one year. I am > looking forward to make a contribution to this project.
Awesome. > Here, I have a few questions about lucene: > > First of all, which version of lucene shall we use as a start point? The > trunk or 3.5? Both of these issues will be trunk only I think: they both are far easier to do with the Codec API in 4.0. > Is there any demo codes to show the idea of Codecs? Maybe the simplest demo would be to look at the SimpleText codec? It roughly "tries" to have simple source code as well as a simple (text only, human readable) on-disk format. > How many posting formats are supposed to be implemented, for project > LUCENE-3892 ? This can be worked out when scoping the project... but I think getting one postings format working well would be awesome :) If somehow that's too easy, then add more! > Is there any further documentation for LUCENE-3069 ? Not that I know of... but I suspect the approach can be very similar to the MemoryPostingsFormat we already have, just that it'd only be the terms data stored in the FST, while the postings (docs/freqs/positions/offsets) are written to a file. Ideally, it would just act like a different terms dictionary implementation, ie so that we can then plug in any PostingsBaseFormat (even the one from LUCENE-3892!). > Thank you! You're welcome, and welcome to Lucene/Solr! Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
