On Tue, Jun 8, 2021 at 5:55 AM Stefan Sperling <s...@elego.de> wrote: > > On Tue, Jun 08, 2021 at 01:45:00AM -0400, Nathan Hartman wrote: > > In order to do some testing, I needed some test data that reproduces > > the issue; since stsp can't share the customer's 100MB XML file, and > > we'd probably want other inputs or sizes anyway, I wrote a program > > that attempts to generate such a thing. I'm attaching that program... > > > > To build, rename to .c extension and, e.g., > > $ gcc gen_diff_test_data.c -o gen_diff_test_data > > > > To run it, provide two parameters: > > > > The first is a 'seed' value like you'd provide to a pseudo random > > number generator at init time. > > > > The second is a 'length' parameter that says how long (approximately) > > you want the output data to be. (The program nearly always overshoots > > this by a small amount.) > > > > Rather than using the system's pseudo random number generator, this > > program includes its own implementation to ensure that users on any > > system can get the same results when using the same parameters. So if > > different people want to test with the same sets of input, you only > > have to share 2 numbers, rather than send each other files >100MB of > > useless junk. > > > > Example: Generate two files of approx 100 MB, containing lots of > > differences and diff them: > > > > $ gen_diff_test_data 98 100m > one.txt > > $ gen_diff_test_data 99 100m > two.txt > > $ time diff one.txt two.txt > /dev/null > > > > With the above parameters, it takes my system's diff about 50 seconds > > to come up with something that looks reasonable at a glance; svn's > > diff has been crunching away for a while now... > > Thank you Nathan, this is incredibly useful! > > Would you consider committing this tool to our repository, e.g. somewhere > within the tools/dev/ subtree?
Sure, done in r1890601. It's in tools/dev/gen-test-data/gen_diff_test_data.c. I added the gen-test-data directory in case we want to add other sample data generators in the future. Cheers, Nathan