@amit : here is the reason :-
each url say http://www.geeksforgeeks.org you will hash following urls http://www.geeksforgeeks.org http://www.geeksforgeeks.org/archives http://www.geeksforgeeks.org/archives/19248 http://www.geeksforgeeks.org/archives/1111 http://www.geeksforgeeks.org/archives/19221 http://www.geeksforgeeks.org/archives/19290 http://www.geeksforgeeks.org/archives/1876 http://www.geeksforgeeks.org/archives/1763 "http://www.geeksforgeeks.org" is the redundant part in each url ..... it would unnecessary m/m to save all URLs. ok now say file have 20 million urls ..... .....now what would you do.?? On Wed, May 16, 2012 at 10:50 AM, Amit Mittal <[email protected]>wrote: > Why hashing won;t work for millions of URL. > If you hash each URL in to a distinct 32 bit integer, you can map 2^32 URL > which is around 4 billion. it should work. > > > On Wed, May 16, 2012 at 10:42 AM, atul anand <[email protected]>wrote: > >> i was thinking about using TRIE or patricia tree. hashing is another but >> it wont work if URLs are in millions >> is there any better data structure ? >> >> >> On Tue, May 15, 2012 at 11:37 PM, Varun <[email protected]> wrote: >> >>> should be a tree based on domain in url and directory mentioned in url. >>> >>> >>> On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote: >>>> >>>> Given a file which contain millions of URL's. which data structure >>>> would you use for storing these URL's . data structure used should store >>>> and fetch data in efficient manner. >>> >>> >>> On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote: >>>> >>>> Given a file which contain millions of URL's. which data structure >>>> would you use for storing these URL's . data structure used should store >>>> and fetch data in efficient manner. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Algorithm Geeks" group. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msg/algogeeks/-/idbhSUZ6TNIJ. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/algogeeks?hl=en. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Algorithm Geeks" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/algogeeks?hl=en. >> > > > > -- > Regards > Amit Mittal > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
