On Tue, Jun 8, 2021 at 5:55 AM Stefan Sperling <s...@elego.de> wrote:
>
> On Tue, Jun 08, 2021 at 01:45:00AM -0400, Nathan Hartman wrote:
> > In order to do some testing, I needed some test data that reproduces
> > the issue; since stsp can't share the customer's 100MB XML file, and
> > we'd probably want other inputs or sizes anyway, I wrote a program
> > that attempts to generate such a thing. I'm attaching that program...
> >
> > To build, rename to .c extension and, e.g.,
> > $ gcc gen_diff_test_data.c -o gen_diff_test_data
> >
> > To run it, provide two parameters:
> >
> > The first is a 'seed' value like you'd provide to a pseudo random
> > number generator at init time.
> >
> > The second is a 'length' parameter that says how long (approximately)
> > you want the output data to be. (The program nearly always overshoots
> > this by a small amount.)
> >
> > Rather than using the system's pseudo random number generator, this
> > program includes its own implementation to ensure that users on any
> > system can get the same results when using the same parameters. So if
> > different people want to test with the same sets of input, you only
> > have to share 2 numbers, rather than send each other files >100MB of
> > useless junk.
> >
> > Example: Generate two files of approx 100 MB, containing lots of
> > differences and diff them:
> >
> > $ gen_diff_test_data 98 100m > one.txt
> > $ gen_diff_test_data 99 100m > two.txt
> > $ time diff one.txt two.txt > /dev/null
> >
> > With the above parameters, it takes my system's diff about 50 seconds
> > to come up with something that looks reasonable at a glance; svn's
> > diff has been crunching away for a while now...
>
> Thank you Nathan, this is incredibly useful!
>
> Would you consider committing this tool to our repository, e.g. somewhere
> within the tools/dev/ subtree?


Sure, done in r1890601.

It's in tools/dev/gen-test-data/gen_diff_test_data.c.

I added the gen-test-data directory in case we want to add other
sample data generators in the future.

Cheers,
Nathan

Reply via email to