The issue is that the real world document has more than 2 fields. Me giving an example of two was a bit misleading. I can't really "pair" them. Here's a better example:
Resume_a Exp_1 Language:Java, Years:5, Certification:Sun, Area:Web Exp_2 Language:C, Years:3, Certification:None, Area:Desktop And yes, I need to be able to query for ranges on the Years field. So for now, I'll do as much in Lucene, and then post-filter on my own any extra documents in the search result. Thank you. tjk :) On Fri, Jan 15, 2010 at 11:31 AM, Erick Erickson <erickerick...@gmail.com>wrote: > Well, a variant on the easy solution might. What would > happen if you indexed the un-split pairs in the same > field? I.e. > > "java:5", "c:3", "php:2" all indexed as *single* tokens > in the *same* field? > > But I think you should look at Digy's suggestion again. > 6-10 fields is absolutely no problem at all. Documents > in Lucene can have any number of fields without > any relation to any other document. So having > "sparse" documents has no penalty (unlike a RDBMS). > > In particular, with Digy's scheme you can answer questions > like "has more than 3 years of C experience" whereas > with my scheme it would be harder. > > > FWIW > Erick > > On Fri, Jan 15, 2010 at 11:19 AM, TJ Kolev <tjko...@gmail.com> wrote: > > > Hi! > > > > I don't think the easy solution will work for me, because I'll have more > > than two fields in a group - perhaps 6 - 10. > > > > However using span queries looks very promising. I'll investigate that. > > > > I see setPositionIncrement() only on the Token object. Is there a way to > > set > > this when adding a field to the document, so that the first token get its > > position pushed away. I would prefer not to modify my analyzer if > possible. > > > > Thank you. > > tjk :) > > > > On Wed, Jan 13, 2010 at 3:52 PM, Erick Erickson <erickerick...@gmail.com > > >wrote: > > > > > Ooooh, isn't that easier. You just prompted me to think > > > that you don't even have to do that, just index the pairs as single > > > tokens (KeywordAnalyzer? but watch out for no case folding)... > > > > > > On Wed, Jan 13, 2010 at 4:30 PM, Digy <digyd...@gmail.com> wrote: > > > > > > > How about using languages as fieldnames? > > > > Doc1(Ra): > > > > Java:5 > > > > C:2 > > > > PHP:3 > > > > > > > > Doc2(Rb) > > > > Java:2 > > > > C:5 > > > > VB:1 > > > > > > > > Query:Java:5 AND C:2 > > > > > > > > DIGY > > > > > > > > -----Original Message----- > > > > From: TJ Kolev [mailto:tjko...@gmail.com] > > > > Sent: Wednesday, January 13, 2010 11:00 PM > > > > To: java-user@lucene.apache.org > > > > Subject: Problem: Indexing and searching repeating groups of fields > > > > > > > > Greetings, > > > > > > > > Let's assume I have to index and search "resume" documents. Two > fields > > > are > > > > defined: Language and Years. The fields are associated together in a > > > group > > > > called Experience. A resume document may have 0 or more Experience > > > groups: > > > > > > > > Ra{ E1(Java,5); E2(C,2); E3(PHP,3);} > > > > Rb{ E1(Java,2); E2(C,5); E3(VB,1);} > > > > > > > > How do I index such documents, and how do I search, so I can > formulate > > a > > > > query like this "Resumes which have (Java,5) and (C,2)" and get back > > Ra. > > > I > > > > know I can index multiple fields of the same name, and do > > "(Language:Java > > > > AND Years:5) AND (Language:C AND Years:2)", but in addition to Ra > that > > > > would > > > > also return Rb, which I don't want. The problem here is that the > > > "grouping" > > > > is lost. I can create fields with compound names (E1Language, > E1Years, > > > > E2Language, E2Years, etc), but that helps me none, as I don't know > > which > > > > group to search. I'd also like to query for "(Language:Java AND > > Years:5) > > > OR > > > > (Language:C AND Years:2)" > > > > > > > > This is a simplified example. Real documents may have 30 - 40 groups, > > > each > > > > one with several fields. Putting all the fields in a group in one > index > > > > field won't work as the numeric/date ones should be available for > range > > > > searchers. > > > > > > > > So far the way I see it is to do my own post processing on the > results. > > > The > > > > issue is that text fields will need to be untokenized, or otherwise > it > > > > would > > > > be difficult to work on the result, and determine what matches. > > > > > > > > Thank you. > > > > tjk :) > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > >