A bit off-topic, but I'd really like to see is the ability to perform highlighting asynchronously, that is - first get the search results from Elsaticsearch, process them and get the highlighted snippets on a second wave, asynchronously.
The main problem with highlighting currently is that it is slow - because of hackish recursive algorithms and mandatory I/O access. I'd like to avoid doing 2-step searches (one search for the results, the other one is to artificially propagate the highlights to the UI on a "second wave" - I wonder if we can come up with a way to have ES propagate them asynchronously for us? -- Itamar Syn-Hershko http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Author of RavenDB in Action <http://manning.com/synhershko/> On Wed, Dec 31, 2014 at 5:38 PM, Nikolas Everett <[email protected]> wrote: > Highlighting isn't a nice pretty thing - its kind of a hacky. There are > three highlighters built in > <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html> > to Elasticsearch and they all work differently. You should try all of them > and see if they do what you want. They all come at the problem from a > different perspective and have their own idiosyncrasies. I maintain a > highlighter > plugin <https://github.com/wikimedia/search-highlighter> as well that you > can use as a forth option. It merges lots of the implementation strategies > that the other ones use together and attempts to give you more options and > it might do what you need. > > Nik > > On Tue, Dec 23, 2014 at 12:44 PM, Yang Liu <[email protected]> wrote: > >> No one knows anything about this? I really appreciate anything you >> offered. >> >> >> On Monday, December 22, 2014 5:27:57 PM UTC-5, Yang Liu wrote: >>> >>> Hi, guys, >>> I have a question about highlight query in ES. >>> *Below is my query,* >>> { >>> "_source": [ >>> >>> ..... >>> ], >>> "highlight": { >>> "fields": { >>> "FDS_ATTACHMENTS": { >>> "type": "plain" >>> }, >>> "FDS_ATTACHMENTS.no_stem": { >>> "type": "plain" >>> }, >>> "FDS_ATTACHMENTS.with_case": { >>> "type": "plain" >>> }, >>> "headline": { >>> "type": "plain" >>> }, >>> "headline.no_stem": { >>> "type": "plain" >>> }, >>> "headline.with_case": { >>> "type": "plain" >>> } >>> }, >>> "fragment_size": 500, >>> "highlight_query": { >>> "bool": { >>> "must": [ >>> { >>> "bool": { >>> "minimum_should_match": 1, >>> "should": [ >>> { >>> "span_near": { >>> "clauses": [ >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "rights" >>> } >>> }, >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "agreement" >>> } >>> } >>> ], >>> "in_order": true, >>> "slop": 0 >>> } >>> } >>> ] >>> } >>> }, >>> { >>> "bool": { >>> "minimum_should_match": 1, >>> "should": [ >>> { >>> "span_near": { >>> "clauses": [ >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "rights" >>> } >>> }, >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "agreement" >>> } >>> }, >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "merger" >>> } >>> } >>> ], >>> "in_order": false, >>> "slop": 5 >>> } >>> } >>> ] >>> } >>> } >>> ] >>> } >>> }, >>> "number_of_fragments": 50, >>> "post_tags": [ >>> "</font>" >>> ], >>> "pre_tags": [ >>> "<font color=red>" >>> ], >>> "require_field_match": true >>> }, >>> "query": { >>> "filtered": { >>> "filter": { >>> "range": { >>> "story_datetime": { >>> "gte": "20141221t000000", >>> "lte": "20141222t235959" >>> } >>> } >>> }, >>> "query": { >>> "bool": { >>> "must": [ >>> { >>> "bool": { >>> "minimum_should_match": 1, >>> "should": [ >>> { >>> "span_near": { >>> "clauses": [ >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "rights" >>> } >>> }, >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "agreement" >>> } >>> } >>> ], >>> "in_order": true, >>> "slop": 0 >>> } >>> }, >>> { >>> "span_near": { >>> "clauses": [ >>> { >>> "span_term": { >>> "headline.no_stem": "rights" >>> } >>> }, >>> { >>> "span_term": { >>> "headline.no_stem": "agreement" >>> } >>> } >>> ], >>> "in_order": true, >>> "slop": 0 >>> } >>> }, >>> { >>> "span_near": { >>> "clauses": [ >>> { >>> "span_term": { >>> "headline2.no_stem": "rights" >>> } >>> }, >>> { >>> "span_term": { >>> "headline2.no_stem": "agreement" >>> } >>> } >>> ], >>> "in_order": true, >>> "slop": 0 >>> } >>> } >>> ] >>> } >>> }, >>> { >>> "bool": { >>> "minimum_should_match": 1, >>> "should": [ >>> { >>> "span_near": { >>> "clauses": [ >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "rights" >>> } >>> }, >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "agreement" >>> } >>> }, >>> { >>> "span_term": { >>> "FDS_ATTACHMENTS.no_stem": "merger" >>> } >>> } >>> ], >>> "in_order": false, >>> "slop": 5 >>> } >>> }, >>> { >>> "span_near": { >>> "clauses": [ >>> { >>> "span_term": { >>> "headline.no_stem": "rights" >>> } >>> }, >>> { >>> "span_term": { >>> "headline.no_stem": "agreement" >>> } >>> }, >>> { >>> "span_term": { >>> "headline.no_stem": "merger" >>> } >>> } >>> ], >>> "in_order": false, >>> "slop": 5 >>> } >>> }, >>> { >>> "span_near": { >>> "clauses": [ >>> { >>> "span_term": { >>> "headline2.no_stem": "rights" >>> } >>> }, >>> { >>> "span_term": { >>> "headline2.no_stem": "agreement" >>> } >>> }, >>> { >>> "span_term": { >>> "headline2.no_stem": "merger" >>> } >>> } >>> ], >>> "in_order": false, >>> "slop": 5 >>> } >>> } >>> ] >>> } >>> } >>> ] >>> } >>> } >>> } >>> }, >>> "size": 50, >>> "sort": [ >>> { >>> "_score": { >>> "ignore_unmapped": true, >>> "order": "desc" >>> } >>> }, >>> { >>> "story_datetime": { >>> "order": "desc" >>> } >>> } >>> ] >>> } >>> >>> And here is a response I got, >>> >>> - of the Transactions set forth in the Offering Memorandum, and >>> redeeming the Notes, if applicable and (d) conducting such other >>> activities >>> as are necessary or appropriate to carry out the activities described >>> above. Prior to the Merger Date, the Company shall not own, hold or >>> otherwise have any interest in any material assets other than cash and >>> cash >>> equivalents and its <font color=red>rights</font> and obligations under >>> the >>> <font color=red>Merger</font> <font color=red>Agreement</font>. >>> ARTICLE 5. SUCCESSORS Section 5.01 >>> >>> You could see that the slop between <font color=red>rights</font> and <font >>> color=red>Agreement</font> are definitely more than 0, not adjacent at >>> all! >>> Could someone give me suggestions that how I can change the query to >>> make sure that in all the segments, rights and agreement are adjacent. >>> I have set the slop to be 0 in the highlight query, and I don't know why >>> ES not skip this segment, since it does not match the criteria. >>> >>> Thank you very much! >>> >>> >>> >>> >>> >>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/e35c12fd-a8bb-4edd-825d-f846eaeb02c4%40googlegroups.com >> <https://groups.google.com/d/msgid/elasticsearch/e35c12fd-a8bb-4edd-825d-f846eaeb02c4%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0PyNZ%2BZKhoRX6YqfNkZnctu9NsVsuZGh0yvudU3xDGLw%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0PyNZ%2BZKhoRX6YqfNkZnctu9NsVsuZGh0yvudU3xDGLw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuOyebnTiFJ%2B3MNVP5-nb8_U0HZRJ%2BXP%2Bbu4QrroDQ%2B1w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
