Re: [CODE4LIB] apache hadoop

Eric Lease Morgan Wed, 19 Dec 2018 10:54:17 -0800

Thank you for the replies, and now I am aware of three different tools for 
parallel/distributed computing:


  1. GNU Parallel (https://www.gnu.org/software/parallel/) - 'Works great for 
me. Easy to install. Easy to use. Given well-written programs, can be used to 
take advantage of all the cores on your computer.

  2. Apache Hadoop (https://hadoop.apache.org) - 'Seems to be optimized for 
map/reduced computing, but can take advantage of many computers with many cores.

  3. Apache Spark (http://spark.apache.org) - 'Seems to have learned lessons 
from Hadoop. 'Seems easier to install, and can also take advantage of many 
computers with many cores, but also comes with a few programming interfaces 
such as Python & Java. Also seems a bit easier to install than Hadoop.

If I had one computer with 132 cores, then I'd probably stick with Parallel. If 
I had multiple computers with multiple cores, then I'd probably look more 
closely at Spark. Both Hadoop & Spark are "heavy" frameworks, compared to 
Parallel which is one Perl script.

--
Eric Morgan

Re: [CODE4LIB] apache hadoop

Reply via email to