What you propose is good if you want to index only the pos of a token. But I want to index some extra info, such as "lemma" of a token, phonetic encoding, etc. Sorry, I was not too general in my previous post. Imagine you want to ask this:
an adj whose lemma is "quick" followed by "brown" followed by a noun whose phonetic enconding is "fots". So, the main problem is you cannot ask if several "synonyms" exist at the same position. Thank you Michael for your answer. 2015-03-03 20:52 GMT+01:00 Michael Sokolov <msoko...@safaribooksonline.com>: > What if you indexed every word with two synonyms: the plain unadorned word > and a token formed by concatenating the pos and the word with some unusual > separator character? > > For example, "the quick brown fox" would be: > > { the | article:the } {quick | adj:quick } { brown | adj:brown } { fox | > noun:fox } > > with punctuation to suggest the token graph > > -Mike > > > On 03/03/2015 01:21 PM, David Villarejo wrote: > >> After many google searchs I decided to post my problem here hoping that >> someone help me. What I want to achieve is to perform queries as follows >> (Don't worry about the query format): >> >> q1: (adjective) "jumps" (preposition) // any adj followed by "jumps" >> followed by any prep. >> q2: (adjective:brown) "jumps" (preposition) // brown as adj. followed by >> "jumps" followed by any prep. >> q3: (adjective:brown) (verb:jumps) (preposition) // brown as adj followed >> by jumps as verb followed by any preposition. >> >> In a more general form, what I want is >> (POS[:specific_word]) (POS[:specific_word]) (POS[:specific_word]) >> >> For that, I have the text tagged as follows: >> >> the|[pos:DT][lemma:the] quick|[pos:JJ][lemma:quick] >> brown|[pos:JJ][lemma:brown] fox|[pos:NN][lemma:fox] >> jumps|[pos:NNS][lemma:jump] over|[pos:IN][lemma:over] >> the|[pos:DT][lemma:the] lazy|[pos:JJ][lemma:lazy] dog|[pos:NN][lemma:dog] >> >> The first thing I thought was to index extra info of each term as payload >> and using PayloadNearQuery after in order to access to the payload of each >> span. The problem is that PayloadNearQuery match the terms first and then >> access its payload, so none of the 3 above queries will work. (correct me >> if I'm wrong) >> >> The second thing I thought was to index extra info as synonyms of the term >> but, this way, the second query won't work since I can't ask if the first >> term is an adj and the specific word "brown" simultaneously. >> >> Any way to address this problem, suggestions, etc. will be appreciated. >> >> >> David. >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >