Interesting. This problem is like compression, but the first thing I thought of was pattern recognition. If your tokens are just characters in a string, you can let regular expressions do all the heavy lifting:
(defn rep-groups [s] (re-seq #"(.+)\1+|." s)) Then (rep-groups ".,ababcababc.,cc,.abab.") for example, gives (["." nil] ["," nil] ["ababcababc" "ababc"] ["." nil] ["," nil] ["cc" "c"] ["," nil] ["." nil] ["abab" "ab"] ["." nil]) Now we just need to convert each of those match vectors into the form you want, like this: (map (fn [[mtch subst]] (if subst [(/ (count mtch) (count subst)) subst] mtch)) *1) ;; Where *1 is the match results, above. Gives ("." "," [2 "ababc"] "." "," [2 "c"] "," "." [2 "ab"] ".") Cool! Pretty close to what you need. But you wanted to use arbitrary tokens, like :a, :b, etc. So let's just represent each token by a unique character. (There are LOTS of characters available.) Here's a function that creates two 1-to-1 maps -- one that converts tokens to characters, and the other its inverse, converting characters to their tokens: (defn token-to-char-map-and-inverse [token-collection] (let [tokens (seq (set token-collection)) chars (map char (range (count tokens)))] [(zipmap tokens chars) (zipmap chars tokens)])) We'll need a presentation function like the (fn ...) above. Since it will also convert the characters back to tokens, it will also depend upon the char-to-token map. So the following function takes such a map and returns such a function: (defn count-subst-fn [c-to-t] (fn [[mtch subst]] (if subst [(/ (count mtch) (count subst)) (map c-to-t subst)] (apply c-to-t mtch)))) Finally, put it all together: (defn parse-reps [token-seq] (let [[t-to-c c-to-t] (token-to-char-map-and-inverse token-seq) s (apply str (map t-to-c token-seq))] (map (count-subst-fn c-to-t) (rep-groups s)))) For example, (parse-reps [:_ :- :a :b :c :a :b :c :a :a :_]) gives (:_ :- [2 (:a :b :c)] [2 (:a)] :_) Notice that the input itself is used to define the set of tokens we're using. We could even us a string as input. (parse-reps ".,ababcababc.,cc,.abab.") gives (\. \, [2 (\a \b \a \b \c)] \. \, [2 (\c)] \, \. [2 (\a \b)] \.) These four (defn ...) provide a simple solution to your problem. Using regular expressions also yields a nice separation of concerns, I think, and it allows for experimentation with the pattern. As defined above, (parse-reps [:a :b :a :b :c :a :b :a :b :c]) gives ([2 (:a :b :a :b :c)]) But let's try redefining the regular expression (note the "?"): (defn rep-groups [s] (re-seq #"(.+?)\1+|." s)) Then the same (parse-reps [:a :b :a :b :c :a :b :a :b :c]) gives the less greedy result, ([2 (:a :b)] :c [2 (:a :b)] :c) -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en