On 24.12.2010 14:25, Stefan Fuhrmann wrote:
On 23.12.2010 03:44, Gavin Beau Baumanis wrote:
Hi Johan,
I was intrigued by your requirement to create a large file for testing.
I remember from a really long time ago when I learnt C, that we used
a specific algorithm for creating "natural" and "random" text.
With some help from Mr.Google found out about Markov Chains that look
promising - I can't remember if that was what I learned about or not
- but it looks like it might be a prove helpful none the less.
A little further Googlng and I found this specific post on
stackoverflow.
http://stackoverflow.com/questions/1037719/how-can-i-quickly-create-large-1gb-textbinary-files-with-natural-content
No Idea if it is going to help you specifically or not... but there
are quite a few ideas in the comments;
* Obtain a copy of the first 100MB from wikipedia - for example.
You might try some recent LINUX tar ball (~400MB).
It should be
* mainly but probably not entirely text
* very close to typical real-world data (large config file
sections, lots of source code, maybe some binary /
UTF16 data)
* accessible to everybody for independent testing etc.
Just an idea ;)
-- Stefan^2.
... you may import many versions (including the RCs)
of it to form a deep history.
-- Stefan^2.