Would it be possible to create some sort of numerical value from the
discriminating/significant
text at index time in order to sort the documents by?

You can index the documents with term vectors, which will allow you to
access the term frequency values:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Not sure if those values can be used in script or even to sort by. Using
scripts, you can get access to the fields. It would be time-consuming, but
you can iterate through each term of a field and use the text scoring
features to get the appropriate values.

Cheers,

Ivan


On Mon, Apr 28, 2014 at 6:48 AM, Ramdev Wudali <[email protected]> wrote:

> Ivan:
>    I filter the index for documents containing AAPL(the ticket symbol) (as
> part of a field that is filterable).
> I get back 1000 documents in no particular order as the request was just a
> filter.  To this filter, I would like to add a "discriminating/significant"
> text  that would be found in the 1000 documents. So that the documents
> returned are in a sense only those that are significant.
>
> I do not want the terms to be significant against the whole index, but
> only against the documents that are returned for the query. Hence I would
> like to run some extra analysis against this filter request result to
> identify these "discriminating/significant" terms.
>
> I was wondering if I can access the elastic API /underlying implementation
> to do the calculations.
>
> Ramdev
>
>
>
>
>
> On Friday, 25 April 2014 13:09:35 UTC-5, Ivan Brusic wrote:
>
>> Can you provide a small example of what you are trying to achieve? Are
>> the discriminating terms known beforehand or is it dependent on the
>> document? Have you looked into the new text scoring features which have
>> been released since the original post? It is worth looking into:
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/modules-advanced-scripting.html
>>
>> You can probably calculate the TF values during indexing, but not the IDF
>> since that value is based on all of the documents in a shard.
>>
>> Cheers,
>>
>> Ivan
>>
>>
>>
>> On Fri, Apr 25, 2014 at 8:46 AM, Ramdev Wudali <[email protected]> wrote:
>>
>>> A variant on this particular request:
>>>
>>> I would like to get the tf-idf for an indexed field. (the field is a
>>> body of a news document). I would like to find discriminating terms in the
>>> document set (the document set is a result of executing a filter on the
>>> search index.
>>> The discriminating terms are to help with improving the query as the
>>> number of documents returned are too many and relevant documents are
>>> getting lost in the search result (of executing a filter).
>>>
>>>
>>> Is it possible to run the tf-idf calculations that Elastic does while
>>> indexing the document.(the API to  access the TF-IDF calculations)
>>>
>>> Thanks
>>>
>>> Ramdev
>>>
>>>
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/81a1726e-3b08-4de8-b9ea-28b159516e40%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/81a1726e-3b08-4de8-b9ea-28b159516e40%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgHyiEDcs1zLdAMqVuQV6SO9nOk9SZHNLSyXjC3tHDSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to