Hi,, Consider a non-data master node, this can improve data handling and search speed a lot as I understand.
On Friday, December 12, 2014 6:04:46 AM UTC+1, Ramchandra Phadake wrote: > > Hi, > > We are storing lots of mail messages in ES with multiple fields. 600 > Millions+ messages across 3 ES nodes. > > There is a custom algorithm which works on batch of messages to correlate > based on fields & other message semantics. > Final result involves groups of messages returned similar to say field > collapsing type results. > > Currently we fetch 100K+ messages from ES & apply this logic to return > final results to user. The algo can't be modeled using aggregations. > > Obviously this is not scalable approach if say we want to process 100 M > messages as part of this processing & return results in few mins.The > messages are large & partitioned across few ES nodes. We want to main data > locality while processing so as not to download lots of data from ES over > network. > > Any way to execute some code over shards from within ES, fine if done as > part of postFilter as well. What are options available before thinking > about Hadoop/Spark using es-hadoop library? > > Solr seems to be having such a plugin hook(experimental) for custom > processing. > https://cwiki.apache.org/confluence/display/solr/AnalyticsQuery+API > > Thanks, > Ram > > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5094ea6e-2b96-4fb2-a2ba-e542db009865%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
