2012/3/2 Kunaal wrote: > I am doing a general poll on what are the most prevalent pain points that > people run into with Hadoop? These could be performance related (memory > usage, IO latencies), usage related or anything really. >
My biggest frustration with core Hadoop after the last year or so has been not having the capability to efficiently implement the so-called "analytic functions" in general with map reduce. These are not what one would think they are from just the name by the way - see Oracle Analytics as an example of what I mean. The big advantage is that they often allow you to avoid expensive self-joins which can make a huge difference performance wise. (I would say that 80% of the analytic functions can be implemented with a UDF or a UDA in hive -- things like lead() or lag() or first() or rank() -- but it is the other 20% that would knock the ball out of the park)