Thanks Raj. Unfortunately I have to tear down hadoop completely between runs, including the backing data store, so if possible I need to figure out a way to generate the same data repeatedly by providing a single seed, or similar.
On Sat, Apr 14, 2012 at 2:15 PM, Raj Vishwanathan <rajv...@yahoo.com> wrote: > David > > Since the data generation and sorting is different hadoop jobs, you can > generate the data once and sort the same data as many times as as you want. > > I don't think Teragen is deterministic.( or rather , the keys are random but > the text is deterministic if I remember correctly ) > > > > Raj > > > >>________________________________ >> From: David Erickson <halcyon1...@gmail.com> >>To: common-user@hadoop.apache.org >>Sent: Saturday, April 14, 2012 1:53 PM >>Subject: Is TeraGen's generated data deterministic? >> >>Hi we are doing some benchmarking of some of our infrastructure and >>are using TeraGen/TeraSort to do the benchmarking. I am wondering if >>the data generated by TeraGen is deterministic, in that if I repeat >>the same experiment multiple times with the same configuration options >>if it will continue to generate and sort the exact same data? And if >>not, is there an easy mod to make this happen? >> >>Thanks! >>David >> >> >>