Re: Script to create huge sample files

Parag Kalra Sat, 02 Jan 2010 21:13:15 -0800

Thanks Shlomi for your expert comments and I must admit you have got a very
strong vision. :)


Anyways coming to my first question:

> a.) What do I need to do to make sure that length of new file will
increase
> every time the step 4 is executed.

Although it may have nothing to do with this algorithm but I still thought
of discussing it.

Say I have scripts that dumps some contents on to an output file handler
inside a long loop. With such scripts I have noticed 2 types of behaviour.

1. The size of this output file is zero while the script is getting executed
inside the loop and it increases only when the script execution gets over.

2. And sometimes I have noticed that the size of output file increases every
time some data is dumped on to it. Thus increase in size happens in real/run
time.

I want to control this behaviour. What I can guess is that it has something
to do with output buffer.

Cheers,
Parag




On Sun, Jan 3, 2010 at 12:28 AM, Shlomi Fish <shlo...@iglu.org.il> wrote:

> Hi Parag!
>
> On Saturday 02 Jan 2010 19:56:02 Parag Kalra wrote:
> > Hello All,
> >
> > Major part of my Perl scripting goes in processing text files. And most
> of
> > the times  I need huge sized text files ( 3 MB +) to perform benchmarking
> > tests.
> >
> > So I am planing to write a Perl script which will create huge sized text
> > file of the sample file which it will receive as first Input parameter. I
> > have following algorithm in mind:
> >
> > 1. Provide 2 input parameters to the Perl script - (i) Sample file, (ii)
> > Size of the new file
> > EG: - To create a new file of size 3 MB -
> > perl Create_Huge_File.pl  Sample.txt   3
> >
> > 2. Read the input file and store the contents into an array.
> >
>
> Why an array? Storing it into a single string would be more faster,
> conserve
> more memory and be more efficient. See:
>
> http://www.perl.com/pub/a/2003/11/21/slurp.html
>
> > 3. Create a new file.
> >
> > 4. Dump the contents of the above array into the new file.
> >
>
> Again string.
>
> > 5. Check the length of the new file. If it is less than second input
> > parameter, repeat step 4 or else goto step 6.
> >
>
> You can calculate the existing length in a variable or use
> http://perldoc.perl.org/5.8.8/functions/tell.html .
>
> > 6. Close the new file.
> >
>
> OK.
>
> > I have following questions:
> >
> > a.) What do I need to do to make sure that length of new file will
> increase
> > every time the step 4 is executed.
>
> Nothing. Just print to the output file-handle and it will append to the
> file's
> contents and will increase its size.
>
> >
> > b.) Since lot of I/O is involved is it the most optimised solution? If
> not,
> > does any one has any better design to suffice my requirement.
>
> It should be good enough. Perl does I/O quickly.
>
> >
> > c.) What are the likely bugs that may creep in with this algorithm.
> >
>
> Encoding problems, etc. Logistical problems.
>
> I should note that, in general, your algorithm will produce repetitive text
> with very little Entropy:
>
> http://en.wikipedia.org/wiki/Entropy_%28information_theory%29
>
> One option you may wish to take instead is to chain several different texts
> from sources of free online texts such as http://www.gutenberg.org/ or
> http://wikisource.org/ (and see also
> http://www.google.com/search?q=free%20online%20books ).
>
> Regards,
>
>        Shlomi Fish
>
> --
> -----------------------------------------------------------------
> Shlomi Fish       http://www.shlomifish.org/
> What Makes Software Apps High Quality -  http://shlom.in/sw-quality
>
> Bzr is slower than Subversion in combination with Sourceforge.
> ( By: http://dazjorz.com/ )
>

Re: Script to create huge sample files

Reply via email to