Thanks Raj.  Unfortunately I have to tear down hadoop completely
between runs, including the backing data store, so if possible I need
to figure out a way to generate the same data repeatedly by providing
a single seed, or similar.

On Sat, Apr 14, 2012 at 2:15 PM, Raj Vishwanathan <rajv...@yahoo.com> wrote:
> David
>
> Since the data generation and sorting is different hadoop jobs, you can 
> generate the data once and sort the same data as many times as as you want.
>
> I don't think Teragen is deterministic.( or rather , the keys are random but 
> the text is deterministic if I remember correctly )
>
>
>
> Raj
>
>
>
>>________________________________
>> From: David Erickson <halcyon1...@gmail.com>
>>To: common-user@hadoop.apache.org
>>Sent: Saturday, April 14, 2012 1:53 PM
>>Subject: Is TeraGen's generated data deterministic?
>>
>>Hi we are doing some benchmarking of some of our infrastructure and
>>are using TeraGen/TeraSort to do the benchmarking.  I am wondering if
>>the data generated by TeraGen is deterministic, in that if I repeat
>>the same experiment multiple times with the same configuration options
>>if it will continue to generate and sort the exact same data?  And if
>>not, is there an easy mod to make this happen?
>>
>>Thanks!
>>David
>>
>>
>>

Reply via email to