These days, in my industry experiences, the predominant way to go is Spark for 
distributed work, rather than Hadoop.    

Since many of you have your catalog in Solr (and Blacklight, blush, thank you), 
you could straightforwardly leverage our open source spark-solr library.  
https://github.com/LucidWorks/spark-solr

Slicing and dicing and processing what you have in Solr has never been this 
powerful before.  And it would be possible to adapt to RDBMS backends and 
connect to other repositories from Spark as well. 

    Erik

> On Dec 17, 2018, at 12:52, Eric Lease Morgan <[email protected]> wrote:
> 
> What is your experience with Apache Hadoop?
> 
> I have very recently been granted root privileges on as many as three virtual 
> machines. Each machine has forty-four cores, and more hard disk space & RAM 
> than I really know how to exploit. I got access to these machines to work on 
> a project I call The Distant Reader, and The Distant Reader implements a lot 
> of map/reduce computing.†
> 
> Can use Apache Hadoop to accept jobs on one machine, send it to any of the 
> other two machines, and then save the results in some sort of common/shared 
> file system?
> 
> † In reality, The Distant Reader is ultimately intended to be an XSEDE 
> science gateway --> https://www.xsede.org. The code for the Reader is 
> available on GitHub --> https://github.com/ericleasemorgan/reader
> 
> --
> Eric Morgan
> University of Notre Dame

Reply via email to