Hello,

This is in general a not-so-good idea to have in Hadoop world, where jobs 
should be idempotent. Nutch will probably never have a distributed cache 
because it is, in a plugin, so easy to implement yourself. And because it would 
require a second set of daemons to manage the cache. Daemons like Redis of 
Memcache already provide such a feature and are good at it.

Implementing it in a plugin is always custom work, whether Nutch supports it 
out-of-the-box, or not. Just connect your a client, set or get an item and you 
are good to go. Memcache and Redis (and many others) are a perfect fit for this 
job.

Hadoop itself offers a distributed cache, but it is nothing more than just 
sending the code and data to all nodes, still allowing it to be idempotent 
because each node has the data and code. The Nutch code itself cannot run 
(correct me if i am wrong) without Hadoop's distributed cache.

Markus
 
 
-----Original message-----
> From:Roannel Fernández Hernández <[email protected]>
> Sent: Wednesday 11th January 2017 15:58
> To: [email protected]
> Subject: Re: [MASSMAIL]RE: Distributed cache
> 
> Hi
> 
> Thanks for your advice. But now, what do you think if Nutch had a distributed 
> cache out-of-the-box? I mean, what you propose to me, I think could be native 
> to Nutch.
> 
> Regards
> 
> 
> ----- Original Message -----
> > From: "Markus Jelsma" <[email protected]>
> > To: [email protected]
> > Sent: Wednesday, January 11, 2017 10:32:51 AM
> > Subject: RE: [MASSMAIL]RE: Distributed cache
> > 
> > Hi - just implement it in Java code in the plugins where you need it. It's
> > just a simple telnet protocol or there are plenty of memcached clients for
> > Java.
> > 
> > Markus
> >  
> > -----Original message-----
> > > From:Roannel Fernández Hernández <[email protected]>
> > > Sent: Wednesday 11th January 2017 13:38
> > > To: [email protected]
> > > Subject: Re: [MASSMAIL]RE: Distributed cache
> > > 
> > > Hi Markus:
> > > 
> > > How can I use Memcached with Nutch? Do you know any way to do that?
> > > 
> > > Thanks a lot.
> > > 
> > > ----- Original Message -----
> > > > From: "Markus Jelsma" <[email protected]>
> > > > To: [email protected]
> > > > Sent: Wednesday, January 11, 2017 7:50:31 AM
> > > > Subject: [MASSMAIL]RE: Distributed cache
> > > > 
> > > > Hello - i think memcached is the easiest method to have something cached
> > > > and
> > > > available everywhere.
> > > > Markus
> > > > 
> > > >  
> > > >  
> > > > -----Original message-----
> > > > > From:Roannel Fernández Hernández <[email protected]>
> > > > > Sent: Wednesday 11th January 2017 3:03
> > > > > To: [email protected]
> > > > > Subject: Distributed cache
> > > > > 
> > > > > Hi folks:
> > > > > 
> > > > > I need to share resources (values into variables) between deferent
> > > > > jobs,
> > > > > even could be useful between task in different nodes of a Hadoop
> > > > > cluster.
> > > > > There is something like a distributed cache or something like that in
> > > > > Nutch right now.
> > > > > 
> > > > > To sum up I need something to share data into different task, jobs or
> > > > > even
> > > > > hosts.
> > > > > 
> > > > > Regards
> > > > > 
> > > >  
> > > > > -----------
> > > > > La @universidad_uci es Fidel. Los jóvenes no fallaremos
> > > >  
> > > > > #HastaSiempreComandante
> > > >  
> > > > > #HastalaVictoriaSiempre
> > > >  
> > > > > 
> > > >  
> > > > > 
> > > >  
> > > > 
> > > La @universidad_uci es Fidel. Los jóvenes no fallaremos.
> > > #HastaSiempreComandante
> > > #HastalaVictoriaSiempre
> > > 
> > > 
> > 
> La @universidad_uci es Fidel. Los jóvenes no fallaremos.
> #HastaSiempreComandante
> #HastalaVictoriaSiempre
> 
> 

Reply via email to