Well its good Question Instead of Googling
I would like to give some naive approach for this.. which pays from
time & space
1st Counts the number or words in single large file
for this we can process this like
while (in.get(ch)) //as we read character by character from file
{
if ( ch == ' ' || ch == '\n' || ch == '\t' )
numWords++;
}
for every word we have to count how many time a particular words
occurs using temp count array see counting sort this
then sort the word basic of their frequency ..we will get top 10,
20 ....as many as words wants..from file .
its tough but naive approach
2... Best Approach to Put All Word from File to Hash Table where word
acts as a key & counts act as a value so if again the same word
retrieved from the &
when put into file we have to check whether it exist or not if
yes increment the counter else just put this new word into hash-table
& initialize its count to 1
the Important part of this algorithm..is that how we will know
that this word is already stored in hash-table its requires lot
processing.beacause for this as we
processing a Big File also requires the lot of computation to
first identifying that it is word & then putting word into hash-
table
Well this Kind Of Algorithm Used By Web Crawler when web crawler
looks for new URL from the Web it uses the same approach...
I think Approach will work efficiently for large data. further
it depends how data is organized if we talk about database ..then
again we have to think all possible
ways to solve it.
Correct me if Concepts Seems to be wrong
Thanks & Regards
Shashank Mani ""The best way to escape from a problem is to solve it."
--
You received this message because you are subscribed to the Google Groups
"Algorithm Geeks" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.