Hi mark:
That was just one example. The Documents were news articles. Hence the
broad coverage and not specific on -topic documents. Since this is news
from third party sources, I do not have control over what comes into the
index. (i.e. separate the machine generated from manually edited/curated).
That said, I could perhaps whittle the content down by making sure that the
documents processed are indeed worthy news articles and not random blog
posts and non-releavnt docs.
I do agree with your earlier comment that the query may be too broad. As I
have already mentioned, Its news articles. If these news articles (which
are provided by various sources) come with boilerplate text, Other than
process the document to remove it I cannot do much else. (for now we are
not looking into removing the boilerplate text as it might provide us with
some insight into other information).
The initial investigative exercise in using the Significant terms was to
identify terms that could perhaps enhance the content returned. There is
of course some manual editing of the significant terms to remove
nonsensical terms(in context, of course) to get to the final list of terms
to be added to my query.
Is tehre other functionality (experimental or otherwise) within ES that can
help me do this ?
On Friday, 2 May 2014 18:17:41 UTC-5, Mark Harwood wrote:
>
> Pages like this suggest where the terms "patented" "resistance" and "
> marketintelligence.com's" are being picked up:
> http://www.marketintelligencecenter.com/artificialintelligence.aspx?p=4
> Much of it looks machine-generated.
>
> Too much repetition of stock phrases mixed in with diverse topics make it
> hard to pick up any kind of signal if this is the content you are including
> in your searches.
>
> Cheers,
> Mark
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ac541fd0-4143-47dc-a694-f770e0236b7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.