On Fri, Apr 29, 2011 at 5:02 AM, elton sky <[email protected]> wrote:
> For my benchmark purpose, I am looking for some non-trivial, real life > applications which creates *bigger* output than its input. Trivial example > I > can think about is cross join... > As you say, almost all cross join jobs have that property. The other case that almost always fits into that category is generating an index. For example, if your input is a corpus of documents and you want to generate the list of documents that contain each word, the output (and especially the shuffle data) is much larger than the input. -- Owen
