@Douglas: I was assuming that the file contains the numbers in ASCII, and that normal file buffering is being used. If you think that I need to include the buffers in the given storage, then fine: just use half for buffers and half for the array of counters. The only change to the algorithm would be to replace 1000000 everywhere with 500000.
I find your description below a bit confusing, where you say that you will search for the number sequentially. What number are you searching for? The problem is to find a number that is not in the file, not to check to see if a given number is not in the file. Suppose that the file contains account numbers; the problem is to find an unused account number. So how do you search sequentially through a file to find a number that is not in the file? If I have misinterpreted what you wrote, please present your entire algorithm. Dave On Jun 23, 10:18 pm, Douglas Diniz <[email protected]> wrote: > Dave, I think you method is very slow. If you have a array of 1000000 > of 16 bits integers this mean that all the Ram is in use, and you will > need at least 300x10^6 disk access, because you will need the read the > numbers one by one from the file, and this will take a LOT of time. > And your method use the module operation % for every number, which is slow. > I think the basic algorithm of read blocks of 2MB to Ram and searching > for the number sequentially until you find the number or reach the end > of file, is some order of magnitude better, because you read the > blocks linearly from the disk, which is much, much better than read > one by one, and you dont have to calculate the %. > > > > > > On Thu, Jun 23, 2011 at 11:24 PM, Dave <[email protected]> wrote: > > @Atul: Here is one way to do it: > > > Treat the array as a million 16-bit integers: short a[1000000]. > > > Set the array to zero. > > > Read through the file, and for each number k, increment a[k%1000000]. > > Since the numbers are distinct, there can be at most 900 numbers in > > each bin. > > > Find j such that a[j] < 900. You now know that there is a missing > > number with the six-digit number j as the final six digits. > > > Set a[100] to a[999] to zero. > > > Read through the array again. Ignore any number k whose last six > > digits are not j. For the remaining numbers, set a[k/1000000] = 1. > > > Find i, with 100 <= i <= 999, such that a[i]==0. This gives you the > > first three digits of a missing number. > > > Put the first thee digits together with the last six digits: k = > > 1000000*i + j to get a number that is not in the file. > > > Dave > > > On Jun 23, 7:59 am, atul purohit <[email protected]> wrote: > >> Hi, > > >> Can someone explain me how to do this. > > >> Given a file containing roughly 300 million 9-digit numbers. Find a 9-digit > >> number that isn't present in the file.You have unlimited drive space but > >> only 2MB of RAM available. > > >> I suppose we have to use bit-vectors n all but still couldnt figure out a > >> proper way. A few steps or an algo will be great. > > >> Cheers, > >> Atul > > > -- > > You received this message because you are subscribed to the Google Groups > > "Algorithm Geeks" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group > > athttp://groups.google.com/group/algogeeks?hl=en. > > -- > ------------------------------------------------------------------- > Douglas Gameiro Diniz > Engenheiro de Computação - 2003 - UNICAMP > > Mobile: (19) 92158777 > Gtalk: dgdiniz > Msn: [email protected] Hide quoted text - > > - Show quoted text - -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
