Re: [gcj] My analysis of Alien Rhyme

Kulendra Janaka Tue, 16 Apr 2019 08:23:19 -0700

On Tuesday, 16 April 2019 08:55:12 UTC+5:30, /dev/joe  wrote:
> My solution was simpler than this. I didn't worry about comparing pairs. I 
> didn't even really worry about the identities of the words; just made sure to 
> count things properly. But like you, I did a search starting with the longest 
> repeats. I think that part may be required to get a working large-case 
> solution.
> 
> 
> I just took, for each word, suppose it is length X, the last X letters, the 
> last X-1 letters, the last X-2 letters, ..., the last 1 letter and put each 
> into a dictionary (Python) for final substrings of its length. If the string 
> was new I set its value to 1; if old, I added 1.
> 
> 
> Then I looked at all the final substrings of the longest length. Of course 
> there won't be any pairs here because the words are all different. So I go to 
> the next shorter length, and so on. When I find a substring with more than 1 
> word, I've identified a pair I can use. So now I increment by 2 the number of 
> rhymed words, and decrement by 2 the counts for all final substrings of this 
> string (to indicate I have used up two of the words with those endings). And 
> I continue the search with the next final substring.
> 
> 
> There's up to one dictionary entry to process per letter in the words. Each 
> word is removed from the pool at most once, at which point I have another 
> operation that involves as many dictionary operations as letters in the 
> matching ending. The dictionary operations are O(log N), so this is O(sum of 
> all the word lengths * log N)
> 
> 
> 
> 
> On Mon, Apr 15, 2019 at 8:56 PM Ronald van Loon <[email protected]> wrote:
> Hi,
> 
> 
> 
> tried to participate in the first round, but it was at 3am local time. Had my 
> alarmclock at 2:45, duly turned it off, nodded of for five minutes, then 
> suddenly it was 4:30. Don't know how that happened. 
> 
> 
> 
> Anyway, here is my analysis of the problem:
> 
> 
> 
> This is a problem about suffixes. Unfortunately, these are at the end of the 
> words, so it makes sense to put these at the front.
> 
> 
> 
> So, while reading, reverse the words - we do not have to list the pairs, 
> after all, only count them. This is a O(N) operation.
> 
> 
> 
> After we read all the words, we sort them. (Note: you could actually do the 
> compares without sorting or even while sorting, but sorting them puts the 
> longer matches closer together and is usually optimized if you use the 
> standard library). This is an O(N log N) operation.
> 
> 
> 
> We now compare every word against each other, noting the common suffix. We 
> put the result of the comparison in a multimap, meaning that we sort on 
> <suffixlength, <word1-index, word2-index> >. This is O(N^2). The problem 
> states that a suffix is of at least length 1, so if two words do not have a 
> common suffix we do not need to put them in this map.
> 
> 
> 
> Once this is done, we no longer need the words themselves, as we can now use 
> transitive properties (if the suffix of a certain word W1, matches that of 
> another word with W2, and W1 == W3, then W2 == W3, provided they all have a 
> suffixlength in common that is the same for all three of them).
> 
> 
> 
> We now have a map with all pairs, grouped together by suffix length. We now 
> iterate over this map in the following fashion:
> 
> 
> 
> - get the largest suffixlength left in the map
> 
> - get all pairs with that suffixlength from the map
> 
> - process all pairs, but only process those pairs that have two words in them 
> that have not been picked in a previous pass as rhyming. (there could be a 
> match which was processed earlier with a larger suffixlength, causing a word 
> to be marked as rhyming, while it has a smaller suffixlength, being processed 
> now, with another word - we could weed it out, but then our datastructure 
> needs to be more complex, and this is a simple O(log N) verification)
> 
> 
> 
> By using the transitive properties, we now know that if a certain word in 
> this range has a common suffixlength to another, they rhyme. So we can 
> partition the pairs we have at this suffixlength into distinct sets. If a 
> word is in one set, it cannot be in another set. (Or we would get a 
> contradiction with the transitive properties). This is essentially a graph 
> algorithm to determine distinct subgraphs.
> 
> 
> 
> So, all words in a set rhyme. However, we have the restriction that it has to 
> be pairwise. If there is only one pair, we are done with the set, we mark 
> each word as handled (ie. part of a pair). If we have more than one pair, we 
> just pick the first one, mark those as handled. However, we cannot ignore the 
> other pairs - we already know they rhyme, and will also rhyme if we shorten 
> their common length by one. So, we reinsert them into the map at a shorter 
> length (of course, we don't have to that if the length we are processing is 
> equal to 1, as we have at least one letter to put an accent on).
> 
> 
> 
> When all sets have been processed in this fashion, we erase all the pairs we 
> just processed for the main multimap. This ensures that our loop ends - we 
> know that once a certain length L has been processed, we will process L - 1. 
> As we do not reinsert length L pairs, but only L-1 length pairs, the map will 
> diminish in size. We break at length 1, so there is a termination point.
> 
> 
> 
> This algorithm has a worst case of O(P). Worst case this would be O(N^2), but 
> depending on the sparsity of the original words, this would amortize to O(N).
> 
> 
> 
> Note that original analysis uses a trie and recursion. The above does not, as 
> it only needs the pairs, not the words themselves, once processed. In 
> essence, all we need to is compare integers, once we have processed all the 
> pairwise comparisons. This might be a bit faster.
> 
> 
> 
> I can post my actual C++ code for those interested.
> 
> 
> 
> Hope this was useful.
> 
> 
> 
> Regards,
> 
> 
> 
> Ronald 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> You received this message because you are subscribed to the Google Groups 
> "Google Code Jam" group.
> 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> 
> To post to this group, send email to [email protected].
> 
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/google-code/ca46d64d-aaaf-4d47-be0d-8aa5b8948e09%40googlegroups.com.
> 
> For more options, visit https://groups.google.com/d/optout.


Do you guys have a good test set for this? I followed a similar method (suffix 
comparison and elimination of pairs as opposed to tree building). It works on 
the test data set I had but fails on submission.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"Google Code Jam" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-code/f35444cc-59f7-4a13-831a-5562537d1fa6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [gcj] My analysis of Alien Rhyme

Reply via email to