Beginners Digest, Vol 91, Issue 25

beginners-request Thu, 21 Jan 2016 04:03:47 -0800

Send Beginners mailing list submissions to
        beginners@haskell.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        beginners-requ...@haskell.org


You can reach the person managing the list at
        beginners-ow...@haskell.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1.  program not running lazily (Mason Lai)


----------------------------------------------------------------------

Message: 1
Date: Wed, 20 Jan 2016 16:53:02 -0800
From: Mason Lai <masonm...@gmail.com>
To: beginners@haskell.org
Subject: [Haskell-beginners] program not running lazily
Message-ID:
        <cah1ivpf1x8qoak1ym+g4f5h3tvy3jkcoaaqesjn3bwy5pvt...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm teaching myself Haskell and attempting to use it for day-to-day tasks
at work. I signed up for this list a few weeks ago, and this is my first
post. Here's my problem:

I'm given a list of 45 Strings, each nine Chars long. The only characters
are 'A', 'C', 'G' and 'T'. (This is for a bioinformatics application.) I'll
refer to any nine-length String composed of these Chars as a "9-mer".

I need to generate a larger list of 9-mers such that no two 9-mers have a
Levenshtein distance of less than 3. (The list I start with satisfies this
requirement.)

I can generate as many 9-mers as I possibly can, but this process is very
slow. It's also not being computed lazily; calling head on the output list
forces the entire list of 9-mers to be computed. *I'd like to understand
why this list isn't being computed lazily, and how I can change my code to
make it so. *My knowledge of monads is pretty poor as well, so a digression
or a series of questions about why the line [9] >>= (`replicateM` "ACGT")
works would also be helpful, as would general tips about writing clean code.

This is an O(n^2) operation, so I'm not expecting it to be slow. However, I
figured that I'd just be able to take the first N elements from the output
list.

Here's what I have:

import Control.Monad

main :: IO ()
main = interact processData

processData :: String -> String
processData = unlines . (`merge` ([9] >>= (`replicateM` "ACGT"))) . lines

-- Merges two lists of things into a single list
merge :: Eq a => [[a]] -> [[a]] -> [[a]]
merge xs [] = xs
merge xs ys = merge ((head ys) `addInto` xs) $ tail ys

-- Adds a thing into a list if it is "different" enough
addInto :: Eq a => [a] -> [[a]] -> [[a]]
y `addInto` ys =
  if ((minimum $ map (dist y) ys) < 3)
    then ys
    else y:ys

-- Calculate the Levenshtein distance
-- Lloyd Allison algorithm. See Haskell wiki
-- code omitted
dist :: Eq a => [a] -> [a] -> Int

My workaround to getting a smaller subset of the whole list is to replace

[9] >>= (`replicateM` "ACGT")

with

take 10000 $ [9] >>= (`replicateM` "ACGT")

Thanks!
Mason Lai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.haskell.org/pipermail/beginners/attachments/20160120/d3f38aa0/attachment-0001.html>

------------------------------

Subject: Digest Footer

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners


------------------------------

End of Beginners Digest, Vol 91, Issue 25
*****************************************

Beginners Digest, Vol 91, Issue 25

Reply via email to