On Fri, Apr 29, 2011 at 5:02 AM, elton sky <[email protected]> wrote:

> For my benchmark purpose, I am looking for some non-trivial, real life
> applications which creates *bigger* output than its input. Trivial example
> I
> can think about is cross join...
>

As you say, almost all cross join jobs have that property. The other case
that almost always fits into that category is generating an index. For
example, if your input is a corpus of documents and you want to generate the
list of documents that contain each word, the output (and especially the
shuffle data) is much larger than the input.

-- Owen

Reply via email to