[algogeeks] Re: Finding a number from 300million list with constraint on memory.

Dave Thu, 23 Jun 2011 21:06:34 -0700

@Douglas: I was assuming that the file contains the numbers in ASCII,
and that normal file buffering is being used. If you think that I need
to include the buffers in the given storage, then fine: just use half
for buffers and half for the array of counters. The only change to the
algorithm would be to replace 1000000 everywhere with 500000.


I find your description below a bit confusing, where you say that you
will search for the number sequentially. What number are you searching
for? The problem is to find a number that is not in the file, not to
check to see if a given number is not in the file. Suppose that the
file contains account numbers; the problem is to find an unused
account number. So how do you search sequentially through a file to
find a number that is not in the file? If I have misinterpreted what
you wrote, please present your entire algorithm.

Dave

On Jun 23, 10:18 pm, Douglas Diniz <[email protected]> wrote:
> Dave, I think you method is very slow. If you have a array of 1000000
> of 16 bits integers this mean that all the Ram is in use, and you will
> need at least 300x10^6 disk access, because you will need the read the
> numbers one by one from the file, and this will take a LOT of time.
> And your method use the module operation % for every number, which is slow.
> I think the basic algorithm of read blocks of 2MB to Ram and searching
> for the number sequentially until you find the number or reach the end
> of file, is some order of magnitude better, because you read the
> blocks linearly from the disk, which is much, much better than read
> one by one, and you dont have to calculate the %.
>
>
>
>
>
> On Thu, Jun 23, 2011 at 11:24 PM, Dave <[email protected]> wrote:
> > @Atul: Here is one way to do it:
>
> > Treat the array as a million 16-bit integers: short a[1000000].
>
> > Set the array to zero.
>
> > Read through the file, and for each number k, increment a[k%1000000].
> > Since the numbers are distinct, there can be at most 900 numbers in
> > each bin.
>
> > Find j such that a[j] < 900. You now know that there is a missing
> > number with the six-digit number j as the final six digits.
>
> > Set a[100] to a[999] to zero.
>
> > Read through the array again. Ignore any number k whose last six
> > digits are not j. For the remaining numbers, set a[k/1000000] = 1.
>
> > Find i, with 100 <= i <= 999, such that a[i]==0. This gives you the
> > first three digits of a missing number.
>
> > Put the first thee digits together with the last six digits: k =
> > 1000000*i + j to get a number that is not in the file.
>
> > Dave
>
> > On Jun 23, 7:59 am, atul purohit <[email protected]> wrote:
> >> Hi,
>
> >> Can someone explain me how to do this.
>
> >> Given a file containing roughly 300 million 9-digit numbers. Find a 9-digit
> >> number that isn't present in the file.You have unlimited drive space but
> >> only 2MB of RAM available.
>
> >> I suppose we have to use bit-vectors n all but still couldnt figure out a
> >> proper way. A few steps or an algo will be great.
>
> >> Cheers,
> >> Atul
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Algorithm Geeks" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/algogeeks?hl=en.
>
> --
> -------------------------------------------------------------------
> Douglas Gameiro Diniz
> Engenheiro de Computação - 2003 - UNICAMP
>
> Mobile: (19) 92158777
> Gtalk: dgdiniz
> Msn: [email protected] Hide quoted text -
>
> - Show quoted text -

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

[algogeeks] Re: Finding a number from 300million list with constraint on memory.

Reply via email to