@Sankeet: How do you know that you wouldn't do more i/o for swapping
than just reading the data twice? Since the input numbers are not
sorted, it seems that you could be swapping a different block in for
every number.

Dave

On Jun 26, 2:12 pm, Sanket <[email protected]> wrote:
> A small tweak (and possible optimization) to Dave's algorithm mighe be
> to use bit based array. That is, have an array of 125000000 byte
> values where for each byte, the 8 bits represent a flag of whether
> that particular number is present in the file or not.
> So, a[0] would indicate whether numbers 100000000 - 100000007 are
> present in the file or not.
> Now, we scan the file number by number, and for each number scanned,
> we convert the number into an index into the array and set the
> corresponding bit.
> After the entire file has been scanned, the task of finding a number
> not present in the file simply is to scan the array and find a bit
> which has not been set.
> To overcome the RAM limitation, we can keep 500KB of space for a chunk
> of the array which would be swapped in/out based on the values read
> from the file and the remaining 1.5MB would be used for the file
> contents.
>
> On Jun 23, 11:24 pm, Douglas Diniz <[email protected]> wrote:
>
>
>
> > Well, I understood that you have a number (say x) and you want if the
> > number x is not in the file. Atul wrote:
>
> > "The numbers are all distinct. they may vary from 100,000,000 to
> > 999,999,999 and there are roughly 300 million (ie. 300 * 1000,000
> > numbers approx.). So there are 600*1000,000 number which are not in
> > the file and we are asked to return any one these 600*1000,000 numbers
> > given the memory constraints. Moreover the numbers are in random
> > order. "
>
> > And in the first email:
>
> > "Find a 9-digit number that isn't present in the file."
>
> > The description is ambiguous. But I think you are right. My
> > interpretation was wrong.
>
> > On Fri, Jun 24, 2011 at 1:06 AM, Dave <[email protected]> wrote:
> > > @Douglas: I was assuming that the file contains the numbers in ASCII,
> > > and that normal file buffering is being used. If you think that I need
> > > to include the buffers in the given storage, then fine: just use half
> > > for buffers and half for the array of counters. The only change to the
> > > algorithm would be to replace 1000000 everywhere with 500000.
>
> > > I find your description below a bit confusing, where you say that you
> > > will search for the number sequentially. What number are you searching
> > > for? The problem is to find a number that is not in the file, not to
> > > check to see if a given number is not in the file. Suppose that the
> > > file contains account numbers; the problem is to find an unused
> > > account number. So how do you search sequentially through a file to
> > > find a number that is not in the file? If I have misinterpreted what
> > > you wrote, please present your entire algorithm.
>
> > > Dave
>
> > > On Jun 23, 10:18 pm, Douglas Diniz <[email protected]> wrote:
> > >> Dave, I think you method is very slow. If you have a array of 1000000
> > >> of 16 bits integers this mean that all the Ram is in use, and you will
> > >> need at least 300x10^6 disk access, because you will need the read the
> > >> numbers one by one from the file, and this will take a LOT of time.
> > >> And your method use the module operation % for every number, which is 
> > >> slow.
> > >> I think the basic algorithm of read blocks of 2MB to Ram and searching
> > >> for the number sequentially until you find the number or reach the end
> > >> of file, is some order of magnitude better, because you read the
> > >> blocks linearly from the disk, which is much, much better than read
> > >> one by one, and you dont have to calculate the %.
>
> > >> On Thu, Jun 23, 2011 at 11:24 PM, Dave <[email protected]> wrote:
> > >> > @Atul: Here is one way to do it:
>
> > >> > Treat the array as a million 16-bit integers: short a[1000000].
>
> > >> > Set the array to zero.
>
> > >> > Read through the file, and for each number k, increment a[k%1000000].
> > >> > Since the numbers are distinct, there can be at most 900 numbers in
> > >> > each bin.
>
> > >> > Find j such that a[j] < 900. You now know that there is a missing
> > >> > number with the six-digit number j as the final six digits.
>
> > >> > Set a[100] to a[999] to zero.
>
> > >> > Read through the array again. Ignore any number k whose last six
> > >> > digits are not j. For the remaining numbers, set a[k/1000000] = 1.
>
> > >> > Find i, with 100 <= i <= 999, such that a[i]==0. This gives you the
> > >> > first three digits of a missing number.
>
> > >> > Put the first thee digits together with the last six digits: k =
> > >> > 1000000*i + j to get a number that is not in the file.
>
> > >> > Dave
>
> > >> > On Jun 23, 7:59 am, atul purohit <[email protected]> wrote:
> > >> >> Hi,
>
> > >> >> Can someone explain me how to do this.
>
> > >> >> Given a file containing roughly 300 million 9-digit numbers. Find a 
> > >> >> 9-digit
> > >> >> number that isn't present in the file.You have unlimited drive space 
> > >> >> but
> > >> >> only 2MB of RAM available.
>
> > >> >> I suppose we have to use bit-vectors n all but still couldnt figure 
> > >> >> out a
> > >> >> proper way. A few steps or an algo will be great.
>
> > >> >> Cheers,
> > >> >> Atul
>
> > >> > --
> > >> > You received this message because you are subscribed to the Google 
> > >> > Groups "Algorithm Geeks" group.
> > >> > To post to this group, send email to [email protected].
> > >> > To unsubscribe from this group, send email to 
> > >> > [email protected].
> > >> > For more options, visit this group 
> > >> > athttp://groups.google.com/group/algogeeks?hl=en.
>
> > >> --
> > >> -------------------------------------------------------------------
> > >> Douglas Gameiro Diniz
> > >> Engenheiro de Computação - 2003 - UNICAMP
>
> > >> Mobile: (19) 92158777
> > >> Gtalk: dgdiniz
> > >> Msn: [email protected] Hide quoted text -
>
> > >> - Show quoted text -
>
> > > --
> > > You received this message because you are subscribed to the Google Groups 
> > > "Algorithm Geeks" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to 
> > > [email protected].
> > > For more options, visit this group 
> > > athttp://groups.google.com/group/algogeeks?hl=en.
>
> > --
> > -------------------------------------------------------------------
> > Douglas Gameiro Diniz
> > Engenheiro de Computação - 2003 - UNICAMP
>
> > Mobile: (19) 92158777
> > Gtalk: dgdiniz
> > Msn: [email protected] Hide quoted text -
>
> - Show quoted text -

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to