Re: [algogeeks] INTERVIEW QUESTION

Saurabh Kumar Tue, 30 Oct 2012 11:49:40 -0700

you are right.
k is the edit distance we are searching for and a critical parameter. In
short you can say- k represents how much error(in terms of edit-distance)
you want to tolerate for between document word 'w' and your suggestion.
since our data structure can answer queries for e.g. "Find all words with
k<=5)" I think we can do better as loading the tree and searching could be
costly so, instead of repeatedly firing queries many times for k=1, 2,
3,...
i think it's better to do it like:


1. for a given document word 'w': you could start k = 0 (for exact
matching, i.e. if w is present in dictionary or not) if returned
list.size() =1 then its' a valid word else, if it's NULL fire a query for
k=2.
>From the function return a list of all dictionary words which are
*<=2*distance from 'w' and return a sorted list based on edit
distance.
sometimes returned list could be large so you need to filter out the Best
possible Suggestions for 'w'.
like- you might wanna give preference to those words which were 1 distance
away than 2. and in that - those edits which have the edited 'alphabet
close to the mispelled one'... like the example- w = REDT [REST is more
likely than RENT as 'S' appears closer to 'D' on keyboard than 'N'] etc. or
they sound same based on some phoneme model etc.

2. if for k=2 returned list was NULL, you can query for k=5, and check if
there are any words with edit-distance *<=5*., again returned list could
possibly be NULL as well. you might want to limit your search for k (say
5). e.g. if document contains w = "ijljhflkjjiulgihh"  It's highly unlikely
that your dictionary will contain any word closer to this (unless ur
dictionary contains crazy volcano names from iceland):
so for cases like these, after k=5 you can return "No Suggestion".

It's actually experimentative. you could try any other way also but this
way you can limit your no. of queries/per word to 2.


A correction: I realize previously I've interchangeably used teh name
'KDTree' and 'bk-tree', both are metric trees but what I really meant was a
'bkTree'. where, a node has arbitrary no. of children and the parent-child
edge represents the corresponding Levenshtein distance between them. The
basic idea here is to store your dictionary in a data-structure whcih
facilitates searching of words based on their edit-distances.


On 27 October 2012 22:40, payal gupta <[email protected]> wrote:

>  the question mentioned is as it is....i just copy pasted it here.
> @saurabh thanx for the explainaton of the cube problem i guess that is an
> appropriate soln for the question.
> and for the other question on detection of typos and  suggestion i would
> like to know to know what 'k' in your explaination stands for?how are the
> values allocated to it ? should it be for each wrong word not mentioned in
> the dictionary we got to check if the word exists with edit distance equal
> to 1 in dictioanry
> and so on until we get the correct word???
>
>
>
>
> on Sat, Oct 27, 2012 at 8:12 AM, Saurabh Kumar <[email protected]>wrote:
>
>> could you please share the link? coz at first glance a Trie looks like a
>> bad choice for this task.
>>
>> I'd go with the Levenshtein distance and a kd-tree.
>> First implement the Levenshtein distance algorithm to calculate the edit
>> distance of two strings.
>> Second, since Levenshtein distance qualifies as a metric space we can use
>> a metric tree like BK-tree to populate it with our dictionary.
>> Choose a random word from dictionary as a root and subsequently insert
>> dictionary words(picking them up randomly) into the tree.
>> A node has arbitrary no. of children. The parent-child edge represents
>> the corresponding Levenshtein distance between them.
>>
>> Building the tree is one time process. Once the tree is built we can
>> devise a way to serialize it and store it.
>>
>> Using this tree we can find all the words with edit-distance less than or
>> equal to, say k.
>> Lets, define a function call in Tree class as: List KDTreeSearch(s, k);
>> which searches for all strings s' in the tree such that |s-s'| <= k i.e.
>> all strings which are less than or equal to an edit distance of k.
>> Searching:
>> Start with the Root and calculate the edit-distance of s from root. If
>> its', say d then we know exactly which children we need to descend to in
>> order to find the words with distance <=k.
>>
>> Looking for typos:
>> Scan the document and for each word 'w' make a call: list =
>> KDTreeSearch(w, 0);
>> if, list.size() = 1. //We have the word in dictionary.
>> else, list = KDTreeSearch(w, 2); // searching for all words with edit
>> distance of 2 from w
>>
>> returned 'list' can sometimes be large, we can subsequently filter it out
>> by narrowing down our definition of 'typos'
>> e.g. for typo w = REDT [REST is more likely than RENT] or maybe some
>> Phoneme model etc.... you should discuss this at length with the
>> interviewer.
>>
>> On 27 October 2012 07:03, Raghavan <[email protected]> wrote:
>>
>>> By any chance did you read the new blog post by Gayle Laakmaan..
>>>
>>> I guess to detect typos we can use some sort of Trie implementation..
>>>
>>>
>>> On Fri, Oct 26, 2012 at 7:50 PM, payal gupta <[email protected]>wrote:
>>>
>>>>
>>>>    Given a cube with sides length n, write code to print all possible
>>>> paths from the center to the surface.
>>>>    Thanx in advance.
>>>>
>>>>
>>>>    Regards,
>>>>   PAYAL GUPTA,
>>>>   NIT-B.
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Algorithm Geeks" group.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msg/algogeeks/-/ZaItRf_9A_IJ.
>>>>
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/algogeeks?hl=en.
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Raghavan KL
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Algorithm Geeks" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/algogeeks?hl=en.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Algorithm Geeks" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/algogeeks?hl=en.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] INTERVIEW QUESTION

Reply via email to