Would it be possible to create some sort of numerical value from the discriminating/significant text at index time in order to sort the documents by?
You can index the documents with term vectors, which will allow you to access the term frequency values: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html Not sure if those values can be used in script or even to sort by. Using scripts, you can get access to the fields. It would be time-consuming, but you can iterate through each term of a field and use the text scoring features to get the appropriate values. Cheers, Ivan On Mon, Apr 28, 2014 at 6:48 AM, Ramdev Wudali <[email protected]> wrote: > Ivan: > I filter the index for documents containing AAPL(the ticket symbol) (as > part of a field that is filterable). > I get back 1000 documents in no particular order as the request was just a > filter. To this filter, I would like to add a "discriminating/significant" > text that would be found in the 1000 documents. So that the documents > returned are in a sense only those that are significant. > > I do not want the terms to be significant against the whole index, but > only against the documents that are returned for the query. Hence I would > like to run some extra analysis against this filter request result to > identify these "discriminating/significant" terms. > > I was wondering if I can access the elastic API /underlying implementation > to do the calculations. > > Ramdev > > > > > > On Friday, 25 April 2014 13:09:35 UTC-5, Ivan Brusic wrote: > >> Can you provide a small example of what you are trying to achieve? Are >> the discriminating terms known beforehand or is it dependent on the >> document? Have you looked into the new text scoring features which have >> been released since the original post? It is worth looking into: >> >> http://www.elasticsearch.org/guide/en/elasticsearch/ >> reference/current/modules-advanced-scripting.html >> >> You can probably calculate the TF values during indexing, but not the IDF >> since that value is based on all of the documents in a shard. >> >> Cheers, >> >> Ivan >> >> >> >> On Fri, Apr 25, 2014 at 8:46 AM, Ramdev Wudali <[email protected]> wrote: >> >>> A variant on this particular request: >>> >>> I would like to get the tf-idf for an indexed field. (the field is a >>> body of a news document). I would like to find discriminating terms in the >>> document set (the document set is a result of executing a filter on the >>> search index. >>> The discriminating terms are to help with improving the query as the >>> number of documents returned are too many and relevant documents are >>> getting lost in the search result (of executing a filter). >>> >>> >>> Is it possible to run the tf-idf calculations that Elastic does while >>> indexing the document.(the API to access the TF-IDF calculations) >>> >>> Thanks >>> >>> Ramdev >>> >>> >>> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/81a1726e-3b08-4de8-b9ea-28b159516e40%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/81a1726e-3b08-4de8-b9ea-28b159516e40%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgHyiEDcs1zLdAMqVuQV6SO9nOk9SZHNLSyXjC3tHDSQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
