Another case is augmenting data.  This is sometimes done outside of MR
in an ETL flow, but can be done as an MR job.  Doing something like
this is using Hadoop to handle the scaling issues, but really isn't
what MR is intended for.

A real example of this is:

* Input: standard apache weblog
* Data added...
  - Geolocation of IP
  - Decoding URL
  - Adding information based on visited URL / Ref URL ...
  - Adding information based on the user
* Output complex binary object to a sequence file


On Fri, Apr 29, 2011 at 08:02, elton sky <[email protected]> wrote:
> One of assumptions map reduce made, I think, is that size of map's output is
> smaller than input. Although we can see many applications have the same size
> of output with input, like, sort, merge,etc.
> For my benchmark purpose, I am looking for some non-trivial, real life
> applications which creates *bigger* output than its input. Trivial example I
> can think about is cross join...
>
> I really appreciate if you share your knowledge with me.
>

Reply via email to