(For very large k case only) Keep as many elements in the hash_map as you can. Once the map capacity is exhausted, sort all the elements in the hash_map and write them to disk and build a bloom filter. Keep this bloom filter<http://en.wikipedia.org/wiki/Bloom_filter>in memory, empty the hash_map and now for every new number in the stream, check in the bloom filter, if it says does not exist, that means this is a new number, insert in the map. If it says the number does exist, then this can be a false positive, then first check in the map, if not found in it, it could be on disk. Use a binary search on the file (this will take O(logn) disk seeks) and verify if its present in the file or not. If present, ignore, else add in the map. Keep repeating the process of flushing the map to disk whenever the map is full. To flush, you could either use any external sorting<http://en.wikipedia.org/wiki/External_sorting>method, or just write out a new file each time. If separate files, you will need to binary search in all of them in case of false positive.
Programmers should realize their critical importance and responsibility in a world gone digital. They are in many ways similar to the priests and monks of Europe's Dark Ages; they are the only ones with the training and insight to read and interpret the "scripture" of this age. On Sat, Jun 1, 2013 at 10:19 PM, MAC <[email protected]> wrote: > Given an infinite stream of integers with only k distinct integers, find > those distinct integers. How would you modify your solution if k is also > very large and cannot use hash table. > > > > > > thanks > --mac > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
