On May 16, 10:33 am, atul anand <[email protected]> wrote: > @amit : > > here is the reason :- > > each url sayhttp://www.geeksforgeeks.org > > you will hash following > urlshttp://www.geeksforgeeks.orghttp://www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/1111http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763 > > "http://www.geeksforgeeks.org" is the redundant part in each url ..... it > would unnecessary m/m to save all URLs. > > ok now say file have 20 million urls ..... .....now what would you do.?? >
I think the trie suggestion was good. Have each domain (with the protocol part) as a node and then have the subsequent directory locations as a hierarchy under it. -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
