My solution was simpler than this. I didn't worry about comparing pairs. I didn't even really worry about the identities of the words; just made sure to count things properly. But like you, I did a search starting with the longest repeats. I think that part may be required to get a working large-case solution.
I just took, for each word, suppose it is length X, the last X letters, the last X-1 letters, the last X-2 letters, ..., the last 1 letter and put each into a dictionary (Python) for final substrings of its length. If the string was new I set its value to 1; if old, I added 1. Then I looked at all the final substrings of the longest length. Of course there won't be any pairs here because the words are all different. So I go to the next shorter length, and so on. When I find a substring with more than 1 word, I've identified a pair I can use. So now I increment by 2 the number of rhymed words, and decrement by 2 the counts for all final substrings of this string (to indicate I have used up two of the words with those endings). And I continue the search with the next final substring. There's up to one dictionary entry to process per letter in the words. Each word is removed from the pool at most once, at which point I have another operation that involves as many dictionary operations as letters in the matching ending. The dictionary operations are O(log N), so this is O(sum of all the word lengths * log N) On Mon, Apr 15, 2019 at 8:56 PM Ronald van Loon <[email protected]> wrote: > Hi, > > tried to participate in the first round, but it was at 3am local time. Had > my alarmclock at 2:45, duly turned it off, nodded of for five minutes, then > suddenly it was 4:30. Don't know how that happened. > > Anyway, here is my analysis of the problem: > > This is a problem about suffixes. Unfortunately, these are at the end of > the words, so it makes sense to put these at the front. > > So, while reading, reverse the words - we do not have to list the pairs, > after all, only count them. This is a O(N) operation. > > After we read all the words, we sort them. (Note: you could actually do > the compares without sorting or even while sorting, but sorting them puts > the longer matches closer together and is usually optimized if you use the > standard library). This is an O(N log N) operation. > > We now compare every word against each other, noting the common suffix. We > put the result of the comparison in a multimap, meaning that we sort on > <suffixlength, <word1-index, word2-index> >. This is O(N^2). The problem > states that a suffix is of at least length 1, so if two words do not have a > common suffix we do not need to put them in this map. > > Once this is done, we no longer need the words themselves, as we can now > use transitive properties (if the suffix of a certain word W1, matches that > of another word with W2, and W1 == W3, then W2 == W3, provided they all > have a suffixlength in common that is the same for all three of them). > > We now have a map with all pairs, grouped together by suffix length. We > now iterate over this map in the following fashion: > > - get the largest suffixlength left in the map > - get all pairs with that suffixlength from the map > - process all pairs, but only process those pairs that have two words in > them that have not been picked in a previous pass as rhyming. (there could > be a match which was processed earlier with a larger suffixlength, causing > a word to be marked as rhyming, while it has a smaller suffixlength, being > processed now, with another word - we could weed it out, but then our > datastructure needs to be more complex, and this is a simple O(log N) > verification) > > By using the transitive properties, we now know that if a certain word in > this range has a common suffixlength to another, they rhyme. So we can > partition the pairs we have at this suffixlength into distinct sets. If a > word is in one set, it cannot be in another set. (Or we would get a > contradiction with the transitive properties). This is essentially a graph > algorithm to determine distinct subgraphs. > > So, all words in a set rhyme. However, we have the restriction that it has > to be pairwise. If there is only one pair, we are done with the set, we > mark each word as handled (ie. part of a pair). If we have more than one > pair, we just pick the first one, mark those as handled. However, we cannot > ignore the other pairs - we already know they rhyme, and will also rhyme if > we shorten their common length by one. So, we reinsert them into the map at > a shorter length (of course, we don't have to that if the length we are > processing is equal to 1, as we have at least one letter to put an accent > on). > > When all sets have been processed in this fashion, we erase all the pairs > we just processed for the main multimap. This ensures that our loop ends - > we know that once a certain length L has been processed, we will process L > - 1. As we do not reinsert length L pairs, but only L-1 length pairs, the > map will diminish in size. We break at length 1, so there is a termination > point. > > This algorithm has a worst case of O(P). Worst case this would be O(N^2), > but depending on the sparsity of the original words, this would amortize to > O(N). > > Note that original analysis uses a trie and recursion. The above does not, > as it only needs the pairs, not the words themselves, once processed. In > essence, all we need to is compare integers, once we have processed all the > pairwise comparisons. This might be a bit faster. > > I can post my actual C++ code for those interested. > > Hope this was useful. > > Regards, > > Ronald > > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "Google Code Jam" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/google-code/ca46d64d-aaaf-4d47-be0d-8aa5b8948e09%40googlegroups.com > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Google Code Jam" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/google-code/CAMAzhzjG9TpXZaEAUA-gNsX-NfdsJEdHrOVxDuygg3N2pDDLrw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
