Re: [jira] [Commented] (SOLR-5069) MapReduce for SolrCloud

Joel Bernstein Wed, 20 May 2015 08:41:15 -0700

If you're going to do be shuffling data to multiple worker nodes then data
will be crossing the network. Shuffling provides the foundation for certain
parallel computing tasks, such as performing large scale parallel
relational algebra.


For machine learning algorithms we'll likely need a parallel iterative
design which leaves the data in place.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 20, 2015 at 4:11 PM, Yonik Seeley <[email protected]> wrote:

> On Wed, May 20, 2015 at 11:06 AM, Noble Paul <[email protected]> wrote:
> > The problem with streaming is data locality. Data needs to be transferred
> > across network to do the processing
>
> Nothing saying that you can't process data before it's streamed out, right?
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [jira] [Commented] (SOLR-5069) MapReduce for SolrCloud

Reply via email to