Re: Script to create huge sample files

Shlomi Fish Sat, 02 Jan 2010 10:59:11 -0800

Hi Parag!

On Saturday 02 Jan 2010 19:56:02 Parag Kalra wrote:
> Hello All,
> 
> Major part of my Perl scripting goes in processing text files. And most of
> the times  I need huge sized text files ( 3 MB +) to perform benchmarking
> tests.
> 
> So I am planing to write a Perl script which will create huge sized text
> file of the sample file which it will receive as first Input parameter. I
> have following algorithm in mind:
> 
> 1. Provide 2 input parameters to the Perl script - (i) Sample file, (ii)
> Size of the new file
> EG: - To create a new file of size 3 MB -
> perl Create_Huge_File.pl  Sample.txt   3
> 
> 2. Read the input file and store the contents into an array.
>


Why an array? Storing it into a single string would be more faster, conserve 
more memory and be more efficient. See:

http://www.perl.com/pub/a/2003/11/21/slurp.html

> 3. Create a new file.
> 
> 4. Dump the contents of the above array into the new file.
> 

Again string.

> 5. Check the length of the new file. If it is less than second input
> parameter, repeat step 4 or else goto step 6.
> 

You can calculate the existing length in a variable or use 
http://perldoc.perl.org/5.8.8/functions/tell.html .

> 6. Close the new file.
> 

OK.

> I have following questions:
> 
> a.) What do I need to do to make sure that length of new file will increase
> every time the step 4 is executed.

Nothing. Just print to the output file-handle and it will append to the file's 
contents and will increase its size.

> 
> b.) Since lot of I/O is involved is it the most optimised solution? If not,
> does any one has any better design to suffice my requirement.

It should be good enough. Perl does I/O quickly.

> 
> c.) What are the likely bugs that may creep in with this algorithm.
> 

Encoding problems, etc. Logistical problems.

I should note that, in general, your algorithm will produce repetitive text 
with very little Entropy:

http://en.wikipedia.org/wiki/Entropy_%28information_theory%29

One option you may wish to take instead is to chain several different texts 
from sources of free online texts such as http://www.gutenberg.org/ or 
http://wikisource.org/ (and see also 
http://www.google.com/search?q=free%20online%20books ).

Regards,

        Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
What Makes Software Apps High Quality -  http://shlom.in/sw-quality

Bzr is slower than Subversion in combination with Sourceforge. 
( By: http://dazjorz.com/ )

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Script to create huge sample files

Reply via email to