Hi,,

Consider a non-data master node, this can improve data handling and search 
speed a lot as I understand.


On Friday, December 12, 2014 6:04:46 AM UTC+1, Ramchandra Phadake wrote:
>
> Hi,
>
> We are storing lots of mail messages in ES with multiple fields. 600 
> Millions+ messages across 3 ES nodes.
>
> There is a custom algorithm which works on batch of messages to correlate 
> based on fields & other message semantics. 
> Final result involves groups of messages returned similar to say field 
> collapsing type results. 
>
> Currently we fetch 100K+ messages from ES & apply this logic to return 
> final results to user. The algo can't be modeled using aggregations. 
>
> Obviously this is not scalable approach if say we want to process 100 M 
> messages as part of this processing & return results in few mins.The 
> messages are large & partitioned across few ES nodes. We want to main data 
> locality while processing so as not to download lots of data from ES over 
> network.
>
> Any way to execute some code over shards from within ES, fine if done as 
> part of postFilter as well. What are options available before thinking 
> about Hadoop/Spark using es-hadoop library? 
>
> Solr seems to be having such a plugin hook(experimental) for custom 
> processing. 
> https://cwiki.apache.org/confluence/display/solr/AnalyticsQuery+API
>
> Thanks,
> Ram
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5094ea6e-2b96-4fb2-a2ba-e542db009865%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to