[algogeeks] Re: 10 most repeating words

ligerdave Sun, 24 Oct 2010 09:38:03 -0700

@Dave
I hear ya. Im just saying in general, you would wanna have an
algorithm for all cases.
of coz, in this case, if the RAM isn't a consideration and heapsort is
what @Vinay wants, i guess we are coming up w/ one like that.
again, in general, you don't wanna have one version of program for
king james and another for something that's more right?


btw, do you have any new clue on the nth largest sum of two arrays? i
realized the solution i gave wasn't working for all cases. im on the
right track i think. however, there is something must be fixed and im
scratching my head now. :) let me know man!

On Oct 22, 11:19 pm, Dave <[email protected]> wrote:
> @Ligerdave: Hey, the King James version of the Bible is only about
> 600,000 words. I use the Bible as an example only because it is a
> fairly large book. Maybe we are talking 10 megabytes to store the
> whole thing, seeing that there are some long words such as "Maher-
> shalal-hash-baz," a name that occurs in Isaiah 8:3. Ten megabytes
> hardly seems "large," when compared to the 4 or 8 gigabytes or more of
> RAM on many computers. Besides, you don't have to keep all of the text
> in memory, but only the distinct words and an integer count of the
> number of occurrences. For the King James bible, this is less than
> 5,000 words, so we're talking a pretty minimal amount of memory. A
> hash table might work fine for this, or build a balanced binary tree
> of the words. After you have scanned all of the input text and
> determined the number of occurrences of each word, it is fairly easy
> to scan the word counts and pick out the ten largest.
>
> Dave
>
> On Oct 22, 9:24 am, ligerdave <[email protected]> wrote:
>
>
>
>
>
>
>
> > for a large file, you probably would want to use external sort. kinda
> > like a map-reduce concept. it's actually how sort&uniq kinda stuff
> > work in unix/linux when you try to find some "TOP X"
>
> > again, we are talking about the memory might not hold the entire file
>
> > On Oct 21, 9:35 am, "Vinay..." <[email protected]> wrote:
>
> > > how do u find 10 most repeating words on a large file containing words
> > > in most efficient way...if it can also be done using heapsort plz post
> > > ur answers..- Hide quoted text -
>
> > - Show quoted text -

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

[algogeeks] Re: 10 most repeating words

Reply via email to