On Tue, Aug 19, 2014 at 5:34 AM, J. Roeleveld <jo...@antarean.org> wrote:
> On Monday, August 18, 2014 10:53:51 AM Alec Ten Harmsel wrote:
>> On Mon 18 Aug 2014 10:50:23 AM EDT, Rich Freeman wrote:
>> > Hadoop is a very specialized tool.  It does what it does very well,
>> > but if you want to use it for something other than map/reduce then
>> > consider carefully whether it is the right tool for the job.
>>
>> Agreed; unless you have decent hardware and can comfortably measure
>> your data in TB, it'll be quicker to use something else once you factor
>> in the administration time and learning curve.
>
> The benefit of clustering technologies is that you don't need high-end
> hardware to start with. You can use the old hardware you found collecting dust
> in the basement.
>
> The learning curve isn't as steep as it used to be. There are plenty of tools
> to make it easier to start using Hadoop.
>

As long as you're counting words and don't mind coding everything in Java.  :)

I found that if you want to avoid using Java, then the available
documentation plummets, and I'm pretty sure the version I was
attempting to use was buggy - it was losing records in the sort/reduce
phase I believe.  Or perhaps I was just using it incorrectly, but the
same exact code worked just fine when I ran it on a single host with a
smaller dataset and just piped map | sort | reduce without using
Hadoop.  The documentation was pretty sparse on how to get Hadoop to
work via stdin/out with non-Java code and it is quite possible I
wasn't quite doing things right.  In the end my problem wasn't big
enough to necessitate using Hadoop and I used GNU parallel instead.

--
Rich

Reply via email to