another rejection heuristic mite be length or  percentage of common letters
or edit distance

On 3/28/07, Arun <[EMAIL PROTECTED]> wrote:
>
> to find the top k words (after computing d), u dont have to fully sort it.
> u can use the linear selection algorithm to find k
> and sort those d of those k words if u need the order by "order of
> closeness" ( O(n +klgk)) or
> u can use a B-tree and once the B-tree u just have to start from the
> leftmost leaf node(which has lowest d value) and traverse k nodes. dont
> quite reber the order for construction of B-tree. after that its O(k) to
> get the k words, as its been already ordered.
>
> if you know wat the function is doing maybe u can do somewat better.
>
> or u can use some heuristics like for example if a threshold (say d=0.25)
> means that anything greater that threshold is not a good candidate, u can
> just drop it in any furhter ranking considerations. or some machine
> learning?
>
> but i think pre-computing is the best solution
>
> On 3/28/07, Kevin <[EMAIL PROTECTED]> wrote:
> >
> >
> > This is from an interview question, see how do you guys think of it:
> >
> > Given a Enlgish word, how to find these words that pronounce similar
> > to it?
> >
> > You can have some reasonable assumptions, such as: you have an
> > dictionary which tells you how each word pronounce (such as its
> > phonetic symbols), there is an existing function that can give you the
> > similarity of two words' pronunciation (for example, double d =
> > checkSimilarity(WordA, WordB), where d is between 0.0 to 1.0). You are
> > allowed to have other assumptions, if you think it is reasonable and
> > will help.
> >
> > A straight way is to compare the given word with all the words in the
> > dictionary, and order the d values, then return the top ranking ones.
> > This will take N comparisons for each given word. Plus the time to
> > sort the result (N*logN).
> >
> > Since this supposed to be a data structure and algorithm question, any
> > better ideas?
> >
> > I am thinking we can precompute a NxN similarity matrix, but even with
> > that, it is still N*logN for each word since we need to sort/rank it.
> >
> > Another method, of course, is to use the above method to precompute
> > and pre-ranking for every word --- but this sounds not an interesting
> > idea as for an interview question. :-)
> >
> >
> > > >
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to