Tagging does not consume that much space - remember, you have an inverted index, the frequency of words does not correlate with index growth.
It is the easiest method to classify documents, see folksonomies. Be careful of synonym files, they are inefficient, slow, and comes with an extra price - you have to restart the cluster each time you modify the synonyms, and if you use synonyms in the index, you have to reindex. Maybe you do not want that overhead. Stemming can not help either in document classification. If you want to process natural language queries and examine the sentence for the meaning and express the meaning in useful tags, you can try plugins for POS tagging, e.g. https://github.com/richardwilly98/elasticsearch-auto-tagging There are plenty of approaches in the natural language processing field, most of them work in front of ES, not as plugins. Jörg On Wed, Mar 4, 2015 at 4:02 PM, Jean-Marc F. <[email protected]> wrote: > Thank you Jörg ! :-) > > I did think of the tag approach: it is very close to the first scheme I > described in my question, that is: querying over the two types then > filtering. It still seems to me that it is an overhead that can be avoided? > (not critical with a few documents but might become when both types' size > increase...) > > I discarded the tag approach for another reason too: the need to tag each > "rent" or "buy" document with always the same words/expressions, which > would enflate the data size and would not leverage ES' intrinsic full-text > abilities (such as stemming, synonym handling, etc.). I do think that, in > that context, working on a simple field/tag ("type" or even "_type" if > feasible?) with the proper analyzer and synonym file would be more > efficient and less error prone. > > But thank you again anyway for your feedback on this topic, it makes me > feel more confident as I did envisage this approach - letting me think I am > not totally lost ^^ > > Cheers, > JM > > Le mercredi 4 mars 2015 12:19:17 UTC+1, Jörg Prante a écrit : >> >> My suggestion is, instead of selecting a unique type, you should tag >> documents in the index with a given vocabulary, and at query time, you >> could match certain phrases in the query text with that vocabulary in order >> to build a filter clause. >> >> Jörg >> >> >> On Wed, Mar 4, 2015 at 11:10 AM, Jean-Marc F. <[email protected]> wrote: >> >>> Now that I have written my question: would be a 2 pass job? First pass: >>> send an "analyze" query to get the proper term "rent" or "buy" (or both if >>> none), then second pass => query the proper type? >>> >>> >>> Le mercredi 4 mars 2015 11:07:43 UTC+1, Jean-Marc F. a écrit : >>> >>>> Hi everyone, >>>> >>>> I am pretty new to ES and need some advice for the following use case: >>>> I have a unique input field for user search (Google like). In my test >>>> index, I have two different types, let's call them "rent" and "buy". What I >>>> would like to achieve is leverage ES's full-text powerful features to >>>> determine which index type to query depending on the query (part of it). >>>> >>>> For instance, for a query such as "rent a motorcycle in Paris" or "hire >>>> a flat in Rome" => is there a way to have ES "know" it should look into the >>>> "rent" type? >>>> >>>> I thought of a first possibility: query both types (/rent,buy/_search) >>>> then filter on a (quite redundant) "type" field created each time a >>>> document is indexed, this "type" field being applied the proper >>>> analyzers/synonyms to always simplify things to "rent" or "buy". (or more >>>> directly the "_type" field but I don't think you can apply analysis to it, >>>> can you?) >>>> >>>> The "cons" to this approach is that I have to query both the rent and >>>> the buy types then filter to narrow the results to the expected type of >>>> documents. The "pros" is that it should not be complicated to have it work >>>> properly. >>>> >>>> Now, I am wondering if it would be possible to have ES "figure out" >>>> what index to query right after analysis? In a process like: query => >>>> analysis => "rent" or "buy" term identified => perform on the right index >>>> type. >>>> The pros would be that you obviously query one index type thus don't >>>> need to filter afterwards: smaller data set + no filtering, should be >>>> lighter/faster. >>>> The cons: I do not think that ES can do it. >>>> >>>> Another scenario would be to handle a first, app specific analysis step >>>> before querying ES just to determine "rent" or "buy". With this example it >>>> would not be that tough (two types, a few synonyms/a bit of stemming to >>>> take into account, etc.), but with a more complex setup it would become a >>>> real nightmare - not to mention the fact that not using ES's abilities >>>> would be quite a pity, actually... >>>> >>>> I would really appreciate your thoughts on this, you all :-) >>>> >>>> Thanks >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/f5de1e5b-c2e6-4cd2-9019-8e520979b6a2% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/f5de1e5b-c2e6-4cd2-9019-8e520979b6a2%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/1d6222c6-6d5a-4b6e-b68f-d7d9d415fa23%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/1d6222c6-6d5a-4b6e-b68f-d7d9d415fa23%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG-zPE4GjwjeCjC0PXhX%2BWbMqo%2Bya-Z4KKRos91tX_nmg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
