One approach would be to do this with multi-valued fields. The idea here is to index all your E fields in the *same* Lucene field with an increment gap (see getPositionIncrementGap) > 1.
For this example, assume getPositionIncrementGap returns 100. Then, for your documents you have something like.... doc.add(new Field("experience", "java,5" blah blah)); doc.add(new Field("experience", "C,2" blah blah)); doc.add(new Field("experience", "PHP,3" blah blah)); Then you do proximity searches with a slop of < 100. The trick is that, the above tokens are positioned (roughly) 1 - java 2 - 5 102 - c 103 - 2 203 - php 204 - 3 Of course you have to override a suitable analyzer to break your tokens up appropriately. Now a query (SpanNear? Proximity? your choice) of the form "java 5"~90 AND "c 2"~90 should only return Ra. HTH Erick On Wed, Jan 13, 2010 at 3:59 PM, TJ Kolev <tjko...@gmail.com> wrote: > Greetings, > > Let's assume I have to index and search "resume" documents. Two fields are > defined: Language and Years. The fields are associated together in a group > called Experience. A resume document may have 0 or more Experience groups: > > Ra{ E1(Java,5); E2(C,2); E3(PHP,3);} > Rb{ E1(Java,2); E2(C,5); E3(VB,1);} > > How do I index such documents, and how do I search, so I can formulate a > query like this "Resumes which have (Java,5) and (C,2)" and get back Ra. I > know I can index multiple fields of the same name, and do "(Language:Java > AND Years:5) AND (Language:C AND Years:2)", but in addition to Ra that > would > also return Rb, which I don't want. The problem here is that the "grouping" > is lost. I can create fields with compound names (E1Language, E1Years, > E2Language, E2Years, etc), but that helps me none, as I don't know which > group to search. I'd also like to query for "(Language:Java AND Years:5) OR > (Language:C AND Years:2)" > > This is a simplified example. Real documents may have 30 - 40 groups, each > one with several fields. Putting all the fields in a group in one index > field won't work as the numeric/date ones should be available for range > searchers. > > So far the way I see it is to do my own post processing on the results. The > issue is that text fields will need to be untokenized, or otherwise it > would > be difficult to work on the result, and determine what matches. > > Thank you. > tjk :) >