@Sankeet: How do you know that you wouldn't do more i/o for swapping than just reading the data twice? Since the input numbers are not sorted, it seems that you could be swapping a different block in for every number.
Dave On Jun 26, 2:12 pm, Sanket <[email protected]> wrote: > A small tweak (and possible optimization) to Dave's algorithm mighe be > to use bit based array. That is, have an array of 125000000 byte > values where for each byte, the 8 bits represent a flag of whether > that particular number is present in the file or not. > So, a[0] would indicate whether numbers 100000000 - 100000007 are > present in the file or not. > Now, we scan the file number by number, and for each number scanned, > we convert the number into an index into the array and set the > corresponding bit. > After the entire file has been scanned, the task of finding a number > not present in the file simply is to scan the array and find a bit > which has not been set. > To overcome the RAM limitation, we can keep 500KB of space for a chunk > of the array which would be swapped in/out based on the values read > from the file and the remaining 1.5MB would be used for the file > contents. > > On Jun 23, 11:24 pm, Douglas Diniz <[email protected]> wrote: > > > > > Well, I understood that you have a number (say x) and you want if the > > number x is not in the file. Atul wrote: > > > "The numbers are all distinct. they may vary from 100,000,000 to > > 999,999,999 and there are roughly 300 million (ie. 300 * 1000,000 > > numbers approx.). So there are 600*1000,000 number which are not in > > the file and we are asked to return any one these 600*1000,000 numbers > > given the memory constraints. Moreover the numbers are in random > > order. " > > > And in the first email: > > > "Find a 9-digit number that isn't present in the file." > > > The description is ambiguous. But I think you are right. My > > interpretation was wrong. > > > On Fri, Jun 24, 2011 at 1:06 AM, Dave <[email protected]> wrote: > > > @Douglas: I was assuming that the file contains the numbers in ASCII, > > > and that normal file buffering is being used. If you think that I need > > > to include the buffers in the given storage, then fine: just use half > > > for buffers and half for the array of counters. The only change to the > > > algorithm would be to replace 1000000 everywhere with 500000. > > > > I find your description below a bit confusing, where you say that you > > > will search for the number sequentially. What number are you searching > > > for? The problem is to find a number that is not in the file, not to > > > check to see if a given number is not in the file. Suppose that the > > > file contains account numbers; the problem is to find an unused > > > account number. So how do you search sequentially through a file to > > > find a number that is not in the file? If I have misinterpreted what > > > you wrote, please present your entire algorithm. > > > > Dave > > > > On Jun 23, 10:18 pm, Douglas Diniz <[email protected]> wrote: > > >> Dave, I think you method is very slow. If you have a array of 1000000 > > >> of 16 bits integers this mean that all the Ram is in use, and you will > > >> need at least 300x10^6 disk access, because you will need the read the > > >> numbers one by one from the file, and this will take a LOT of time. > > >> And your method use the module operation % for every number, which is > > >> slow. > > >> I think the basic algorithm of read blocks of 2MB to Ram and searching > > >> for the number sequentially until you find the number or reach the end > > >> of file, is some order of magnitude better, because you read the > > >> blocks linearly from the disk, which is much, much better than read > > >> one by one, and you dont have to calculate the %. > > > >> On Thu, Jun 23, 2011 at 11:24 PM, Dave <[email protected]> wrote: > > >> > @Atul: Here is one way to do it: > > > >> > Treat the array as a million 16-bit integers: short a[1000000]. > > > >> > Set the array to zero. > > > >> > Read through the file, and for each number k, increment a[k%1000000]. > > >> > Since the numbers are distinct, there can be at most 900 numbers in > > >> > each bin. > > > >> > Find j such that a[j] < 900. You now know that there is a missing > > >> > number with the six-digit number j as the final six digits. > > > >> > Set a[100] to a[999] to zero. > > > >> > Read through the array again. Ignore any number k whose last six > > >> > digits are not j. For the remaining numbers, set a[k/1000000] = 1. > > > >> > Find i, with 100 <= i <= 999, such that a[i]==0. This gives you the > > >> > first three digits of a missing number. > > > >> > Put the first thee digits together with the last six digits: k = > > >> > 1000000*i + j to get a number that is not in the file. > > > >> > Dave > > > >> > On Jun 23, 7:59 am, atul purohit <[email protected]> wrote: > > >> >> Hi, > > > >> >> Can someone explain me how to do this. > > > >> >> Given a file containing roughly 300 million 9-digit numbers. Find a > > >> >> 9-digit > > >> >> number that isn't present in the file.You have unlimited drive space > > >> >> but > > >> >> only 2MB of RAM available. > > > >> >> I suppose we have to use bit-vectors n all but still couldnt figure > > >> >> out a > > >> >> proper way. A few steps or an algo will be great. > > > >> >> Cheers, > > >> >> Atul > > > >> > -- > > >> > You received this message because you are subscribed to the Google > > >> > Groups "Algorithm Geeks" group. > > >> > To post to this group, send email to [email protected]. > > >> > To unsubscribe from this group, send email to > > >> > [email protected]. > > >> > For more options, visit this group > > >> > athttp://groups.google.com/group/algogeeks?hl=en. > > > >> -- > > >> ------------------------------------------------------------------- > > >> Douglas Gameiro Diniz > > >> Engenheiro de Computação - 2003 - UNICAMP > > > >> Mobile: (19) 92158777 > > >> Gtalk: dgdiniz > > >> Msn: [email protected] Hide quoted text - > > > >> - Show quoted text - > > > > -- > > > You received this message because you are subscribed to the Google Groups > > > "Algorithm Geeks" group. > > > To post to this group, send email to [email protected]. > > > To unsubscribe from this group, send email to > > > [email protected]. > > > For more options, visit this group > > > athttp://groups.google.com/group/algogeeks?hl=en. > > > -- > > ------------------------------------------------------------------- > > Douglas Gameiro Diniz > > Engenheiro de Computação - 2003 - UNICAMP > > > Mobile: (19) 92158777 > > Gtalk: dgdiniz > > Msn: [email protected] Hide quoted text - > > - Show quoted text - -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
