Well, I understood that you have a number (say x) and you want if the number x is not in the file. Atul wrote:
"The numbers are all distinct. they may vary from 100,000,000 to 999,999,999 and there are roughly 300 million (ie. 300 * 1000,000 numbers approx.). So there are 600*1000,000 number which are not in the file and we are asked to return any one these 600*1000,000 numbers given the memory constraints. Moreover the numbers are in random order. " And in the first email: "Find a 9-digit number that isn't present in the file." The description is ambiguous. But I think you are right. My interpretation was wrong. On Fri, Jun 24, 2011 at 1:06 AM, Dave <[email protected]> wrote: > @Douglas: I was assuming that the file contains the numbers in ASCII, > and that normal file buffering is being used. If you think that I need > to include the buffers in the given storage, then fine: just use half > for buffers and half for the array of counters. The only change to the > algorithm would be to replace 1000000 everywhere with 500000. > > I find your description below a bit confusing, where you say that you > will search for the number sequentially. What number are you searching > for? The problem is to find a number that is not in the file, not to > check to see if a given number is not in the file. Suppose that the > file contains account numbers; the problem is to find an unused > account number. So how do you search sequentially through a file to > find a number that is not in the file? If I have misinterpreted what > you wrote, please present your entire algorithm. > > Dave > > On Jun 23, 10:18 pm, Douglas Diniz <[email protected]> wrote: >> Dave, I think you method is very slow. If you have a array of 1000000 >> of 16 bits integers this mean that all the Ram is in use, and you will >> need at least 300x10^6 disk access, because you will need the read the >> numbers one by one from the file, and this will take a LOT of time. >> And your method use the module operation % for every number, which is slow. >> I think the basic algorithm of read blocks of 2MB to Ram and searching >> for the number sequentially until you find the number or reach the end >> of file, is some order of magnitude better, because you read the >> blocks linearly from the disk, which is much, much better than read >> one by one, and you dont have to calculate the %. >> >> >> >> >> >> On Thu, Jun 23, 2011 at 11:24 PM, Dave <[email protected]> wrote: >> > @Atul: Here is one way to do it: >> >> > Treat the array as a million 16-bit integers: short a[1000000]. >> >> > Set the array to zero. >> >> > Read through the file, and for each number k, increment a[k%1000000]. >> > Since the numbers are distinct, there can be at most 900 numbers in >> > each bin. >> >> > Find j such that a[j] < 900. You now know that there is a missing >> > number with the six-digit number j as the final six digits. >> >> > Set a[100] to a[999] to zero. >> >> > Read through the array again. Ignore any number k whose last six >> > digits are not j. For the remaining numbers, set a[k/1000000] = 1. >> >> > Find i, with 100 <= i <= 999, such that a[i]==0. This gives you the >> > first three digits of a missing number. >> >> > Put the first thee digits together with the last six digits: k = >> > 1000000*i + j to get a number that is not in the file. >> >> > Dave >> >> > On Jun 23, 7:59 am, atul purohit <[email protected]> wrote: >> >> Hi, >> >> >> Can someone explain me how to do this. >> >> >> Given a file containing roughly 300 million 9-digit numbers. Find a >> >> 9-digit >> >> number that isn't present in the file.You have unlimited drive space but >> >> only 2MB of RAM available. >> >> >> I suppose we have to use bit-vectors n all but still couldnt figure out a >> >> proper way. A few steps or an algo will be great. >> >> >> Cheers, >> >> Atul >> >> > -- >> > You received this message because you are subscribed to the Google Groups >> > "Algorithm Geeks" group. >> > To post to this group, send email to [email protected]. >> > To unsubscribe from this group, send email to >> > [email protected]. >> > For more options, visit this group >> > athttp://groups.google.com/group/algogeeks?hl=en. >> >> -- >> ------------------------------------------------------------------- >> Douglas Gameiro Diniz >> Engenheiro de Computação - 2003 - UNICAMP >> >> Mobile: (19) 92158777 >> Gtalk: dgdiniz >> Msn: [email protected] Hide quoted text - >> >> - Show quoted text - > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > > -- ------------------------------------------------------------------- Douglas Gameiro Diniz Engenheiro de Computação - 2003 - UNICAMP Mobile: (19) 92158777 Gtalk: dgdiniz Msn: [email protected] -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
