[Haskell-cafe] Lock-Free Data Structures using STMs in Haskell

2008-04-09 Thread Pete Kazmier
I recently read the STM paper on lock-free data structures [1] which I
found very informative in my quest to learn how to use STM.  However,
there are a few things I do not fully understand and was hoping
someone might be able to explain further.

In the STM version of the ArrayBlockingQueue, the following type is
defined:

  data ArrayBlockingQueueSTM e = ArrayBlockingQueueSTM {
shead :: TVar Int,
stail :: TVar Int,
sused :: TVar Int,
slen :: Int,
sa :: Array Int (TVar e)
  }

It's unclear to me why the Array's elements must be wrapped in TVars.
Why aren't the TVars on shead, stail, and sused sufficient?  Here is
the only function that reads from the queue:

  readHeadElementSTM :: ArrayBlockingQueueSTM e
- Bool - Bool - STM (Maybe e)
  readHeadElementSTM abq remove block
= do u - readTVar (sused abq)
 if u == 0
then if block
then retry
else return Nothing
else do h - readTVar (ihead abq)
 let tv = sa abq ! h
 -- Why are the array elements wrapped in TVars?
 e - readTVar tv
 if remove
then do
  let len = slen abq
  let newh = h `mod` len
  writeTVar (shead abq) $! newh
  writeTVar (sused abq) $! (u-1)
else return ()
 return (Just e)

It is not immediately obvious to me why the elements need to be
wrapped in TVars.  Could someone help elaborate?

The other question is in regards to section 2 where STM is
introduced.  The authors define the following:

  decT :: TVar Int - IO ()
  decT v = atomically (do x - readTVar v
  if x == 0
 then retry
 else return ()
  writeTVar v (x-1))

And then go on to show how easy it is to compose STM types with this
function:

  decPair v1 v1 :: TVar Int - TVar Int - IO ()
  decPair v1 v2 = atomically (decT v1 `orElse` decT v2)

Will this actually compile?  I was under the impression that 'orElse'
could only combine STM types, not IO () types.  

Thank you,
Pete

[1] Anthony Discolo, Tim Harris, Simon Marlow, Simon Peyton Jones, and
Satnam Singh. Lock-free data structures using STMs in Haskell. In
Eighth International Symposium on Functional and Logic Programming
(FLOPS.06), April 2006.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lock-Free Data Structures using STMs in Haskell

2008-04-09 Thread Pete Kazmier
Bryan O'Sullivan [EMAIL PROTECTED] writes:

 Pete Kazmier wrote:

   data ArrayBlockingQueueSTM e = ArrayBlockingQueueSTM {
 [...]
 sa :: Array Int (TVar e)
   }
 
 It's unclear to me why the Array's elements must be wrapped in TVars.

 To allow them to be modified.  You can't otherwise modify the elements
 of an array without going into the ST monad.

Thanks!  I forgot about the whole immutable thing :-)  Haven't used
arrays yet while learning Haskell!

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Yaham - Yet Another HAskell Mode for GNU Emacs

2007-06-17 Thread Pete Kazmier
Ian Zimmerman [EMAIL PROTECTED] writes:

 It has a different focus than the Moss haskell-mode that's normally
 used nowadays.  Yaham strives, first and foremost, to integrate well
 with the rest of Emacs and respect the Emacs ecosystem.

Could you elaborate on this?  Perhaps an itemized list of things that
are different?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Sneaking haskell in the workplace -- cleaning csv files

2007-06-17 Thread Pete Kazmier
Brandon S. Allbery KF8NH [EMAIL PROTECTED] writes:

 On Jun 15, 2007, at 18:37 , Jason Dagit wrote:

 I love to see people using Haskell, especially professionally, but I
 have to wonder if the real tool for this job is sed? :-)

 Actually, while sed could do that, it'd be a nightmare.  You really
 want a parser to deal with general CSV like this, and while you can
 write parsers in sed, you *really* don't want to.  :)

sed ':a /,$/!{N;s/\n//;ba}' somefile.csv

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Network.HTTP+ByteStrings Interface--Or: How to shepherd handles and go with the flow at the same time?

2007-05-25 Thread Pete Kazmier
As a newbie to Haskell, I found your thorough analysis very
interesting.  Thanks for the great read!  I have a few questions
regarding some of your comments, see below:

Jules Bean [EMAIL PROTECTED] writes:
 E,F. Progressive GET
 pSynGET :: URL - ((Bool,ByteString) - IO ()) - IO ()
 pAsynGET :: URL - ((Bool,ByteString) - IO ()) - IO (MVar ())
 
 (This is a particular simple case of Oleg's iteratees, I
 think) Download the data at whatever speed is convenient. As data
 arrives, feed it to the 'callback' provided. The ByteString is the
 new chunk of data, the 'Bool' is just supposed to indicate whether
 or not this is the final chunk.

 Incidentally there are more complex options than (Bool,Bytestring)
 - IO ().  A simple and obvious change is to add a return
 value. Another is a 'state monad by hand', as in (Bool,Bytestring)
 - s - s, and change the final return value of the type to IO s,
 which allows the callback to accumulate summary information and
 still be written as pure code. 

I want to be sure that I understand the implications of the callback
function returning an IO action as originally proposed versus it being
a pure function.  It would seem to me that if it were a pure callback
the usefulness would be limited as I would not be able to take the
data read from the network and immediately write it out to a file.  Is
this correct?

And if the above is correct, is there a way to define the callback
such that one does not have to hardcode the IO monad in the return
type so you can have the best of both worlds?

 Other options allow the 'callback' to request early termination,
 by layering in an 'Either' type in there. 

I believe the ability to request early termination is important, and
was one of the nice features of Oleg's left-fold enumerators.  It
would be a shame if the API did not offer this capability.

 Another more sophisticated option, I think, is the higher rank

 MonadTrans t = URL -
  ((forall m. Monad m) = (Bool,ByteString) - t m)
  - t IO ()

 ...which, unless I've made a mistake, allows you to write in 'any
 monad which can be expressed as a transformer', by transforming it
 over IO, but still contains the implicit promise that the
 'callback' does no IO. For example t = StateT reduces to the
 earlier s - s example, in effect, with a slightly different data
 layout.

I don't fully understand this, but would this prevent one from calling
IO actions as it was receiving the chunks in the callback (such as
writing it to a file immediately)?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Stack overflow with my Trie implementation

2007-04-26 Thread Pete Kazmier
I've modified my Norvig spelling corrector to use a trie instead of
Data.Map in the hopes of improving performance.  Plus, this is fun and
a great learning exercise for me.  Unfortunately, when I load my trie
with a large amount of data, I get a stack overflow.  It's unclear to
me why this is happening.  I specifically use foldl' to avoid this
situation when building my trie.  Could someone shed some light on the
situation for me?

Here is the code:

 module Main where 
 
 import Data.List (foldl')
 import Data.Maybe (maybe, fromMaybe)
 import Prelude hiding (lookup)
 import qualified Data.Map as M
 
 data Trie a = T (Maybe a) (M.Map Char (Trie a)) deriving (Show)
 
 main = do
 -- big.txt is a large file of words: http://www.norvig.com/big.txt
 c - readFile big.txt
 let freqTrie = foldl' incWordCount empty (words c)
 print $ lookup evening freqTrie
 where 
   incWordCount m w = insertWith (+) w 1 m
 
 empty :: Trie a
 empty = T Nothing M.empty
 
 lookup :: String - Trie a - Maybe a
 lookup ([])   (T Nothing  m) = Nothing
 lookup ([])   (T (Just v) m) = return v
 lookup (k:ks) (T _m) = case M.lookup k m of
  Nothing   - Nothing
  Just trie - lookup ks trie
 
 findWithDefault :: a - String - Trie a - a
 findWithDefault v k t = fromMaybe v (lookup k t)
 
 member :: String - Trie a - Bool
 member k t = maybe False (const True) (lookup k t)
 
 insertWith :: (a - a - a) - String - a - Trie a - Trie a
 insertWith fn ([])   v (T Nothing   m) = T (Just v) m
 insertWith fn ([])   v (T (Just v') m) = T (Just $ fn v v') m
 insertWith fn (k:ks) v (T mvm) = T mv (M.insertWith const k newtrie m)
 where
   oldtrie = M.findWithDefault empty k m
   newtrie = insertWith fn ks v oldtrie

 {-- I also tried to use this line instead of the one above to see
 if this had any impact.  Unfortunately, I obtained the same 
 results.
 
 insertWith fn ([])   v (T (Just v') m) = let x = fn v v' in seq x T (Just x) m
 --}



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Pete Kazmier
Ketil Malde [EMAIL PROTECTED] writes:

 On Sun, 2007-04-22 at 00:25 -0400, Pete Kazmier wrote:
 Pete Kazmier [EMAIL PROTECTED] writes:
 
  I'd love to see other Haskell implementations as well if anyone has a
  few moments to spare.  Admittedly, it took me several hours to get my
  version working, but I'm a Haskell newbie.  Unfortunately, I think it
  runs as slow as it took me to write it!

 Hm - nobody suggested using ByteStrings yet?  String is notoriously
 wasteful, and replacing it with ByteString normally gives a quite
 significant speedup.

I actually have a ByteString version but it runs much slower.  This
part of the code is where all of the time is spent in the ByteString
version:

  type WordFreq   = M.Map B.ByteString Int
 
  train:: [B.ByteString] - WordFreq
  train words = frequencyMap
  where
frequencyMap = foldr incWordCount M.empty words
incWordCount w m = M.insertWith (+) w 1 m
  
 Worse - and this is true for ByteStrings, too - String comparisons are
 O(n), which means lookups in Sets and Maps are expensive.  A trie (i.e,
 a search tree where each internal node corresponds to a word prefix, and
 has one branch per letter in the alphabet) will give you lookup that
 scales with word size (and possibly alphabet size).

Right.  My first version was just a direct translation of Norvig's
code with an emphasis on trying to keep the complexity and size of
code to a minimum.

 Instead of generating the (huge) list of misspelled words, you could
 calculate edit distance to each correctly spelled word?  With a bound on
 allowable mistakes, this is a linear time operation using the standard
 dynamic programming algorithm.

Could you provide additional information on this standard dynamic
programming algorithm?  I'm not familiar with dynamic programming.

Thanks!
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Pete Kazmier
Bryan O'Sullivan [EMAIL PROTECTED] writes:

 After this switch, I found that spellchecking one word still took 4x
 as long in Haskell as Norvig's Python program.  Since I was checking
 only one word in each case, essentially all of the runtime was taken
 up by building the word frequency map.

 train = foldl' updateMap M.empty . map lower . mkWords
 where updateMap model word = M.insertWith' (+) word 1 model
   mkWords = filter (not . B.null) . X.splitWith isNotAlpha
   lower !s = X.map toLower s
   isNotAlpha !c = c  0x41 || (c  0x5a  c  0x61) || c  0x7a
   toLower !c | c = 0x41  c = 0x5a = c + 0x20
  | otherwise = c

After reading Bryan's post, I switched my right fold to a strict left
fold and almost tripled my original speed.  Could someone provide some
guidelines on when to use each?  I thought foldr should be used when
building some large data structure such as the map I build.

Bryan, out of curiosity, is a non bytestring version of your code
faster?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Pete Kazmier
Derek Elkins [EMAIL PROTECTED] writes:

 After reading Bryan's post, I switched my right fold to a strict left
 fold and almost tripled my original speed.  Could someone provide some
 guidelines on when to use each?  I thought foldr should be used when
 building some large data structure such as the map I build.
 Bryan, out of curiosity, is a non bytestring version of your code
 faster?

 http://www.haskell.org/hawiki/StackOverflow

I read the article and understand it, but I am having a hard time
applying that specifically to my use of foldr.  Here is how I was
using foldr:

  type WordFreq   = M.Map B.ByteString Int
 
  train:: [B.ByteString] - WordFreq
  train words = frequencyMap
  where
frequencyMap = foldr incWordCount M.empty words
incWordCount w m = M.insertWith (+) w 1 m

So is 'incWordCount' strict in its second argument?  I'm still not
sure exactly what that means.  According to the wiki page, if it is
strict in the second argument, I should have used foldl' instead of
foldr.

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Pete Kazmier
Recently I read an interesting article by Peter Norvig[1] on how to
write a spelling corrector in 21-lines of Python.  I wanted to try and
implement it in Haskell.  My implementation is terribly slow and I was
hoping for some tips on how to improve it and make it idiomatic.

I'd love to see other Haskell implementations as well if anyone has a
few moments to spare.  Admittedly, it took me several hours to get my
version working, but I'm a Haskell newbie.  Unfortunately, I think it
runs as slow as it took me to write it!  There is definitely something
wrong with it, a memory leak, because I can't correct more than a few
words without a great deal of memory being consumed.

Thanks,
Pete

[1] http://norvig.com/spell-correct.html

module Main where

import Control.Arrow
import Data.Char (toLower, isPunctuation)
import Data.List (maximumBy)
import qualified Data.Set as S
import qualified Data.Map as M

type WordSet= S.Set String
type WordFreq   = M.Map String Int

main :: IO ()
main = do
-- 'holmes.txt' can be found here:
-- http://norvig.com/holmes.txt
-- We should train it with a larger corpus, but my program is
-- is already slow enough.
c - readFile holmes.txt
let correct = train . tokens $ c
interact $ (++\n) . show . (id  correct) . init

-- Returns a list of words lowercased and stripped of punctuation
-- at the end of the word.
tokens :: String - [String]
tokens = map (lower . nopunc) . words
where
  lower   = map toLower
  nopunc  = strip isPunctuation
  strip p = reverse . dropWhile p . reverse

-- Returns a closure (equivalent to the 'correct' function in the
-- Python implementation).
train:: [String] - (String - String)
train words word = maximumBy freq . S.toList . head $ filter (not . S.null)
[ known (S.singleton word),
  known (edits1 word),
  known (edits2 word),
  S.singleton word ]
where
  freq c c'= compare (findfreq c) (findfreq c')
  findfreq c   = M.findWithDefault 1 c frequencyMap
  known= S.filter (`M.member` frequencyMap) 
  frequencyMap = foldr incWordCount M.empty words
  incWordCount w m = M.insertWith (+) w 1 m

edits1 :: String - WordSet
edits1 word = S.fromList $ concat
   [[ t i ++ d (i+1)| i - range n], -- deletion
[ t i ++ [word!!(i+1)] ++ [word!!i] ++ d (i+2) | i - range (n-1) ], -- transposition
[ t i ++ [c] ++ d (i+1) | i - range n, c - alphabet ], -- alteration
[ t i ++ [c] ++ d i | i - range (n+1), c - alphabet ]] -- insertion
where
  n= length word
  t i  = take i word
  d i  = drop i word
  range x  = [ 0..(x-1) ]
  alphabet = ['a'..'z']

edits2 :: String - WordSet
edits2 = S.unions . S.toList . S.map edits1 . edits1
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Pete Kazmier
Pete Kazmier [EMAIL PROTECTED] writes:

 I'd love to see other Haskell implementations as well if anyone has a
 few moments to spare.  Admittedly, it took me several hours to get my
 version working, but I'm a Haskell newbie.  Unfortunately, I think it
 runs as slow as it took me to write it!  There is definitely something
 wrong with it, a memory leak, because I can't correct more than a few
 words without a great deal of memory being consumed.

As monochrom pointed out on #haskell, I am using 'interact'
incorrectly.  For some reason I thought 'interact' applied its
argument to each line of the input.  I've replaced it as follows:

  interact $ unlines . map (show . (id  correct)) . lines

The program is still terribly slow due to my use of lists.  Is there a
better way to write 'edits1'?

Thanks,
Pete

 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Tutorial on Haskell

2007-04-16 Thread Pete Kazmier
Simon Peyton-Jones [EMAIL PROTECTED] writes:

 My guess is that they'll be Linux/Perl/Ruby types, and they'll be
 practitioners rather than pointy-headed academics.

 Suggest concrete examples of programs that are
 * small
 * useful
 * demonstrate Haskell's power
 * preferably something that might be a bit
 tricky in another language

 But there must be lots of others.  

As one of those python/ruby types trying to learn Haskell for the past
year, here are my suggestions for small examples:

- Tom Moertel's Haskell Port Scanner
  Why? Demonstrates concurrent haskell in a small amount of lines
  http://blog.moertel.com/articles/2004/03/13/concurrent-port-scanner-in-haskell

- A web-server example of some sort that interfaces with a database
  and uses some interesting HTML combinator library.

- Building a simple unit testing framework is always a good example
  (even though they already exist), and then introducing quickcheck
  perhaps.

- A program to concurrently verify the links on an HTML page
  recursively.  I'm sure there are lots of interesting idioms and
  techniques that could be used while keeping the code small and
  elegant.

- Perhaps a Haskell version of Norvig's 20-line Python Spell Checker:
  Why? Maybe a Haskell version could be shorter and more elegant?
  http://norvig.com/spell-correct.html

- Tom Moertel's Directory Tree Printing in Haskell:
  Why? Demonstrates all sorts of introductory techniques
  
http://blog.moertel.com/articles/2007/03/28/directory-tree-printing-in-haskell-part-three-lazy-i-o


Some thoughts on other topics suggested by others:

- Parsec is not that interesting for those coming from perl, ruby, or
  python as they rely on regular expressions for everything and just
  expect that they are part of the language.  The thought of writing
  one's own parser is not as cool as most Haskellers believe it is,
  regardless of how interesting the parsec library is.

- STM may be too complex of a subject for an intro to Haskell
  tutorial.  There are just too many concepts in there that may
  overwhelm some beginners.

- Don's post on shell scripting was very interesting, but I'm still
  having a hard time understanding some parts of it, the error
  handling part, and I've been playing with Haskell on and off for the
  past year (I'm also a slow learner and not an academic).

- Don's post on simple UNIX utilities was also quite nice. I believe
  he also wrote a simple IRC bot example that would prove
  interesting, can't seem to find the link at the moment though.

Just my thoughts as a newbie desiring a book on how to use Haskell in
a practical manner (such as Practical Common Lisp book).

-Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Tutorial on Haskell

2007-04-16 Thread Pete Kazmier
Evan Laforge [EMAIL PROTECTED] writes:

 It illustrates a few nice things about haskell: laziness for the
 recursive defs and easy backtracking, low syntax overhead and custom
 operators for DSLs, composability, etc.

Although that is true, I somehow feel that showing a perl, ruby, or
python programmer an alternate approach to regexps, a technique firmly
ingrained into the roots of these languages, will not garner much
interest in Haskell.  I know this is the case for me.  

In fact, it was always a large negative for me that Haskell/GHC never
had decent builtin support for regexps until recently (6.6).  From a
practical point of view, the tasks that I do frequently involve the
use of regexps (for better or worse).  Again, I'm not an academic,
just an everyday python programmer trying to assist me in my day job.

Upon thinking about this subject further, I think it would be very
important that Simon somehow incorporates at least one use of the new
regexp library.  The target audience would by more likely to consider
Haskell if it contains they're beloved tool of choice.  Later, they
can discover the elegance of parsec if needed.

-Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: flip fix and iterate

2007-03-21 Thread Pete Kazmier
Claus Reinke [EMAIL PROTECTED] writes:

 I won't try to understand fix just yet, but I'm still confused by
 the type of fix:
 fix :: (a - a) - a
 It appears to me that it takes a function as an argument, and that
 function takes a single argument.  So how are you passing fix an
 anonymous function taking 2 arguments?  Sorry if I have beaten this
 horse to death, but my pea-sized brain is working overtime here.

 fix takes a function as an argument, and that function takes a
 single argument.  that function also returns something of the same
 type as its single argument.  

[snip]

 and suddenly, fix does have two parameters, which flip can flip!-)

 no magic, just technology sufficiently advanced to be
 indistinguishable from it: a function of one parameter, which
 returns a function of one parameter, is a function of more than one
 parameter.

 at which point this particular fixed-point combinator puts its
 recursive unfoldings to rest for tonight.

Claus,

Thank you for the detailed explanation.  I think I understand now!  To
be sure, I'll reread your post several times over the next few days.

Thanks again, this was very helpful.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: flip fix and iterate

2007-03-20 Thread Pete Kazmier
Matthew Brecknell [EMAIL PROTECTED] writes:

 Pete Kazmier:
 I understand the intent of this code, but I am having a hard time
 understanding the implementation, specifically the combination of
 'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
 one can call 'flip' on a function that takes one argument.

 I threw that in there because I figured you were up for another
 challenge. :-)

Haskell has a way of making one feel dumb.  This is by far the most
challenging programming language I've ever used.

 It took me ages to get some clue about how to use fix, quite apart from
 combining it with flip. The concept of passing the output of a function
 as one of its parameters (tying the knot) can be difficult to accept,
 particularly if you haven't studied lambda calculus. 

This is a bit mind boggling.  Do you have any trivial examples that
may help my understanding?

 Note that I could have just written this:

 let iterate a = do
   ... iterate a' ...
 iterate accum

In the meantime, I'm more than happy to claim ignorance and stick with
the above version which is very accessible to us mere mortals.

 So with my use of flip fix, I'm really just calling fix on the
 anonymous function (\iterate accum - ...), and then the parameter
 (accum) is passed to the function returned by fix. So now you just
 need a couple of weeks (or months if you're as slow as me) to
 understand what fix is all about... :-)

I won't try to understand fix just yet, but I'm still confused by the
type of fix:

 fix :: (a - a) - a

It appears to me that it takes a function as an argument, and that
function takes a single argument.  So how are you passing fix an
anonymous function taking 2 arguments?  Sorry if I have beaten this
horse to death, but my pea-sized brain is working overtime here.

Thanks for all of the help.

Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: fix

2007-03-20 Thread Pete Kazmier
Matthew Brecknell [EMAIL PROTECTED] writes:

 As others have pointed out, fix is polymorphic, so a can stand for any
 type, including (b - c). Removing redundant parentheses, this means
 fix can directly specialise to:

 fix :: ((b - c) - b - c) - b - c

I understand now.  I think part of my problem was that I was trying to
grasp one too many new things all at once.  This makes perfect sense.

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier
Matthew Brecknell [EMAIL PROTECTED] writes:

 Pete Kazmier:
 I attempted to read Oleg's fold-stream implementation [1] as this
 sounds quite appealing to me, but I was completely overwhelmed,
 especially with all of the various type signatures used.  It would be
 great if one of the regular Haskell bloggers (Tom Moertel are you
 reading this?) might write a blog entry or two interpreting his
 implementation for those of us starting out in Haskell perhaps by
 starting out with a non-polymorphic version so as to emphasize the
 approach.
 
 [1] http://okmij.org/ftp/Haskell/fold-stream.lhs

 The basic idea of the paper is the use of a left-fold operator as the
 primary interface for enumarating collections. The recursive version
 (less general than the non-recursive version) of a left-fold operator
 for enumerating the lines of a text file might look something like this:

 import Control.Monad.Fix
 import Control.Exception
 import Data.List
 import qualified Data.Set as S
 import qualified Data.Map as M
 import System.IO
 
 enumLines :: (a - String - Either a a) - a - FilePath - IO a
 enumLines iter accum filename = do
   h - openFile filename ReadMode
   flip fix accum $
 \iterate accum - do
   try_line - try (hGetLine h)
   case try_line of
 Left e - hClose h  return accum
 Right line - do
   case iter accum line of
 Left accum - hClose h  return accum
 Right accum - iterate accum

I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.

 To use this, you provide an iteratee, a function which takes an
 accumulator and a line from the file, and returns a new accumulator
 embedded in an Either. Using the Left branch causes immediate
 termination of the enumeration. For example, to search for the first
 occurrence of each of a set of email headers:

 getHeaders :: S.Set String - FilePath - IO (S.Set String, M.Map String 
 String)
 getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where
   findHdrs accum@(wanted,found) line =
 if null line
   then Left accum
   else
 case headerLine line of
   Nothing - Right accum
   Just hdr -
 case findDelete hdr wanted of
   Nothing - Right accum
   Just wanted -
 let accum = (wanted, M.insert hdr line found) in
   if S.null wanted
 then Left accum
 else Right accum
 
 headerLine :: String - Maybe String
 headerLine (':':xs) = Just []
 headerLine (x:xs) = fmap (x:) (headerLine xs)
 headerLine [] = Nothing
 
 findDelete :: Ord a = a - S.Set a - Maybe (S.Set a)
 findDelete e s = if S.member e s
   then Just (S.delete e s)
   else Nothing

 It's a bit of a case-analysis nightmare, but when comparing this to
 previous approaches, note that file traversal and processing are cleanly
 separated, file handle closure is guaranteed to be timely, file
 traversal stops as soon as all the required headers have been found,
 memory usage is minimised.

Very nice.  I like the clean separation, but as you say, its one ugly
bit of code compared to my original code, although much more elegant
no doubt.

 I hope that helps.

Very much so.  Thank you for you help.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier
Pete Kazmier [EMAIL PROTECTED] writes:

 I attempted to read Oleg's fold-stream implementation [1] as this
 sounds quite appealing to me, but I was completely overwhelmed,
 especially with all of the various type signatures used.  It would be
 great if one of the regular Haskell bloggers (Tom Moertel are you
 reading this?) might write a blog entry or two interpreting his
 implementation for those of us starting out in Haskell perhaps by
 starting out with a non-polymorphic version so as to emphasize the
 approach.

 [1] http://okmij.org/ftp/Haskell/fold-stream.lhs

In the event any other Haskell newbie comes along someday and is just
as overwhelmed as I was, I've found this post by Oleg to be a much
easier to understand than the above paper because it is not as generic
and thus the type signatures are a bit easier on the eyes:

http://www.haskell.org/pipermail/haskell/2003-September/012741.html

With that said, I have a question regarding Hal's response to the
above email in which he states:

 Just thought I'd mention that this is, in fact, my preferred method of
 iterating over a file.  It alleviates the pain associated with lazy file
 IO, and simultaneously provides a useful abstraction.  I actually have
 3*2 functions that I use which look like:
 
  type Iteratee  iter seed = seed - iter - Either seed seed
  hFoldChars  :: FilePath - Iteratee  Char seed - seed - IO seed
  hFoldLines  :: FilePath - Iteratee  String   seed - seed - IO seed
  hFoldWords  :: FilePath - Iteratee  [String] seed - seed - IO seed
 
  type IterateeM iter seed = seed - iter - IO (Either seed seed)
  hFoldCharsM :: FilePath - IterateeM Char seed - seed - IO seed
  hFoldLinesM :: FilePath - IterateeM String   seed - seed - IO seed
  hFoldWordsM :: FilePath - IterateeM [String] seed - seed - IO seed
 
 Which perform as expected (hFoldWords(M) can be written in terms of
 hFoldLinesM, but I find I use it sufficiently frequently to warrent
 having it stand out).  Also, of course, the only ones actually
 implemented are the (M) variants; the non-M variants just throw a return
 into the Iteratee.

What does he mean by the very last sentence?  Oleg's version seems
more like the non-M versions.  What is his implication?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-17 Thread Pete Kazmier
Matthew Brecknell [EMAIL PROTECTED] writes:

 So here's a test. I don't have any big maildirs handy, so this is based
 on the simple exercise of printing the first line of each of a large
 number of files. First, the preamble.

 import Control.Exception (bracket)
 import System.Environment
 import System.IO

 main = do
   t:n:fs - getArgs
   ([test0,test1,test2,test3] !! read t) (take (read n) $ cycle fs)
 
 [snip]

Thank you for summarizing the approaches presented by others.  As a
Haskell newbie, there seems to be quite a few esoteric concepts to
conquer.  Your concrete examples were helpful in my understanding of
the ramifications associated with the various approaches.

After reading the various threads you cited, I decided to avoid lazy
IO altogether.  By using 'readFile' without forcing the strict
evaluation of my parser, I inadvertently relinquished control of the
resource management--closing of the file handles was left to the GC.
And although I could have used 'seq' to address the issue, why bother
fixing a problem that could have been avoided altogther by using
strict IO.

With that said, I added the following function to my program and then
replaced the invocation of 'readFile' with it:

  readEmailHeaders :: FilePath - IO String
  readEmailHeaders file = 
  bracket (openFile file ReadMode) (hClose) (headers [])
  where
headers acc h = do
line - hGetLine h
case line of
  -- Stop reading file once we hit the empty separator
  -- line, no need to read the rest of the file (body).
   - return . concat . reverse $ acc
  _  - headers (\n:line:acc) h

I'm not sure if this is the best implementation, but the speed is
comparable to the lazy IO version without the annoying defect of
running out of file handles.  I also tried an implementation using
'hGetChar' but that was much slower.

I attempted to read Oleg's fold-stream implementation [1] as this
sounds quite appealing to me, but I was completely overwhelmed,
especially with all of the various type signatures used.  It would be
great if one of the regular Haskell bloggers (Tom Moertel are you
reading this?) might write a blog entry or two interpreting his
implementation for those of us starting out in Haskell perhaps by
starting out with a non-polymorphic version so as to emphasize the
approach.

Thanks,
Pete

[1] http://okmij.org/ftp/Haskell/fold-stream.lhs

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Lazy IO and closing of file handles

2007-03-14 Thread Pete Kazmier
When using readFile to process a large number of files, I am exceeding
the resource limits for the maximum number of open file descriptors on
my system.  How can I enhance my program to deal with this situation
without making significant changes?

My program takes one or more directories that contain email messages,
stored one per file, and prints a list of all the email threads.  Here
is a snippet of output:

  New addition to the Kazmier family
Casey Kazmier
  
  Memoization in Erlang?
Thomas Johnsson
Ulf Wiger \(AL/EAB\)

As a newcomer to Haskell, I am intrigued by lazy evaluation and how it
can influence one's designs.  With that said, I wrote the program as a
sequence of list manipulations which seemed quite natural to do in
Haskell starting with reading the contents of the each file.  Here is
the algorithm at the high level:

  1. Read contents of all files returning a list of Strings
  2. Map over the list and parse each String as an Email
  3. Sort the list of Emails
  4. Group the list of Emails by Subject
  5. Map over the grouped list to create a list of Threads
  6. Finally, print the list of Threads

It is my understanding that, as a result of lazy IO, the entire file
does not need to be read into memory because parseEmail only inspects
the topmost portion of the email (its headers), which is a key part of
my design as some of the files can be quite large. Unfortunately, as
soon as I run this program on a directory with more than 1024 files,
GHC craps out on me due to resource limits.  It seems that the handles
opened by readFile remain open.  Would this be common across all
Haskell implementations?

How do I go about fixing this without making a significant number of
changes to my program?  Did I make a mistake in steps 1 and 2 above?
Should I have read and parsed a single file at a time, and then move
on to the next?  

I'd appreciate any other comments on the program as well.  I feel this
is the best example of Haskell code that I have written.  Compared to
the first version of this program I wrote a few months ago, this is a
hundred times better.

Here is the program:

 module Main where
 
 import Control.Monad (filterM, liftM)
 import Data.List
 import Data.Maybe
 import System.Directory
 import System.Environment
 
 type From= String
 type Subject = String
 data Email   = Email {from :: From, subject :: Subject} deriving Show
 data Thread  = Thread [Email]
 
 instance Show Thread where
 show (Thread emails@(e:es)) = title ++ senders
 where
   title   = newline . bolder . subject $ e
   sender  = newline . indent . from
   senders = concatMap sender emails
   newline = (++ \n)
   indent  = (   ++)
   bolder  = (\27[0;32;40m ++) . (++ \27[0m)
 
 main =
 getArgs  =
 mapM fileContentsOfDirectory =
 mapM_ print . threadEmails . map parseEmail . concat 
 
 fileContentsOfDirectory :: FilePath - IO [String]
 fileContentsOfDirectory dir =
 setCurrentDirectory  dir 
 getDirectoryContents dir =
 filterM doesFileExist=  -- ignore directories
 mapM readFile
 
 parseEmail :: String - Email
 parseEmail text = 
 Email (getHeader From) (getHeader Subject)
 where
   getHeader = fromMaybe N/A . flip lookup headers
   headers   = concatMap mkassoc . takeWhile (/=) $ lines text
   mkassoc s = case findIndex (==':') s of
 Just n  - [(take n s, drop (n+2) s)]
 Nothing - []
 
 threadEmails :: [Email] - [Thread]
 threadEmails =
 map Thread . groupBy (fuzzy (==)) . sortBy (fuzzy compare)
 where
   fuzzy fn e e' = stripReFwd (subject e) `fn` stripReFwd (subject e')
   stripReFwd= stripSpaces . reverse . stripToColon . reverse
   stripSpaces   = dropWhile (==' ')
   stripToColon  = takeWhile (/=':')

  


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Pete Kazmier
[EMAIL PROTECTED] (Donald Bruce Stewart) writes:

 pete-expires-20070513:
 When using readFile to process a large number of files, I am exceeding
 the resource limits for the maximum number of open file descriptors on
 my system.  How can I enhance my program to deal with this situation
 without making significant changes?

 Read in data strictly, and there are two obvious ways to do that:

 -- Via strings:

 readFileStrict f = do
 s - readFile f
 length s `seq` return s

 -- Via ByteStrings
 readFileStrict  = Data.ByteString.readFile
 readFileStrictString  = liftM Data.ByteString.unpack 
 Data.ByteString.readFile

 If you're reading more than say, 100k of data, I'd use strict
 ByteStrings without hesitation. More than 10M, and I'd use lazy
 bytestrings.

Correct me if I'm wrong, but isn't this exactly what I wanted to
avoid?  Reading the entire file into memory?  In my previous email, I
was trying to state that I wanted to lazily read the file because some
of the files are quite large and there is no reason to read beyond the
small set of headers.  If I read the entire file into memory, this
design goal is no longer met.

Nevertheless, I was benchmarking with ByteStrings (both lazy and
strict), and in both cases, the ByteString versions of readFile yield
the same error regarding max open files.  Incidentally, the lazy
bytestring version of my program was by far the fastest and used the
least amount of memory, but it still crapped out regarding max open
files. 

So I'm back to square one.  Any other ideas?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] How would you replace a field in a CSV file?

2006-10-01 Thread Pete Kazmier
The other day at work an opportunity arose where I was hoping to sneak
some Haskell into the pipeline of tools used to process call detail
records (CDRs).  In the telecommunications industry, CDRs are used for
billing.  Each CDR is a single line record of 30 comma-separated
values.  Each line is approximately 240 characters in length.  The
task at hand is to replace field number 10 if a new value can be found
in a hashmap which is keyed using the contents of the field.

My colleague was going to write a C program (that's all he knows), but
I whipped up a trivial python program instead.  I was curious if a
haskell version could be faster and more elegant , but I have not been
able to beat my python version in either case.  So, I'm curious as to
how you would go about this task in Haskell.  The input files are
generally 300-400MB, and the hashmap will contain perhaps 20-30 items.

For those that know python, here is a very simple implementation that
happens to be very fast compared to my Haskell version and very short:

for line in sys.stdin:
fields = line.split(',')
fields[9] = tgmap.get(fields[9], fields[9])
print ,.join(fields),

For each line in standard input:

  - Splits the string on the comma: field0,field1,...,field29 = 
[field0, field1, ..., field29] to obtain a list of strings.

  - Gets the value associated with the key of field9 from tgmap, if it
does not exist, it returns a default value which is the original
value.  I.e., if it's not in the map, then don't replace the
field.

  - Joins the list of fields with a comma to yield a string again
which is printed out to standard output.  The join method on the
string is a bit odd: ,.join([1,2,3]) = 1,2,3

Here is my first Haskell attempt:

import Data.ByteString.Lazy.Char8 as B hiding (map,foldr)
import Data.List (map)
import Data.Map as M hiding (map)

-- This is just a placeholder until I actually populate the map
tgmap = M.singleton (B.pack Pete) (B.pack Kazmier)

main = B.interact $ B.unlines . map doline . B.lines
where doline= B.join comma . mapIndex fixup . B.split ','
  fixup i s = if i==9 then M.findWithDefault s s tgmap else s
  comma = B.pack ,

-- f is supplied the index of the current element being processed
mapIndex f xs = m f 0 xs
where m f i [] = []
  m f i (x:xs') = f i x : m f (i+1) xs'

After talking with dons on #haskell, he cleaned my version up and
produced this version which gets rid of 'if' statement and makes
mapIndex stricter:

import Data.ByteString.Lazy.Char8 as B hiding (map,foldr)
import Data.List (map)
import Data.Map as M hiding (map)

-- This will be populated from a file
dict = M.singleton (B.pack Pete) (B.pack Kazmier)

main = B.interact $ B.unlines . map doline . B.lines
where doline= B.join comma . mapIndex fixup . B.split ','
  comma = B.singleton ','
  fixup 3 s = M.findWithDefault s s dict
  fixup n s = s

-- f is supplied the index of the current element being processed
mapIndex :: (Int - ByteString - ByteString) - [ByteString] -
[ByteString]
mapIndex f xs = m xs 0
where m []  _ = []
  m (x:xs') i = f i x : (m xs' $! i+1)

That helped things a bit, but I must confess I don't understand how
the strictness improved things as I had assumed things were going to
be evaluated in a reasonable amount of time due to the printing of
output.  I thought IO was interlaced with the execution and thus I
wasn't going to have to concern myself over laziness.  In addition,
the function is able to generate new elements of the list on demand so
I thought it was a good citizen in the lazy world.  Could anyone help
explain?

And then he came up with another version to avoid the 'unlines', but
that did not that really speed things up significantly.  So, with all
that said, is there a better approach to this problem?  Perhaps a more
elegant Haskell solution?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: How would you replace a field in a CSV file?

2006-10-01 Thread Pete Kazmier
[EMAIL PROTECTED] writes:

 For such a small self-contained task, I don't think Haskell
 is any better than Python.

I figured as much, but I thought with the new FPS lazy bytestrings it
might have a chance in terms of raw speed.  On the other side of the
coin, in terms of elegance, I thought I'd ask as haskellers always
amaze me with their one-liners :-)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Transformational Patterns?

2006-07-18 Thread Pete Kazmier
After reading the paper Pattern Guards and Transformational Patterns
by Martin Erwig and Simon Peyton Jones, I'm left wondering about the
status of transformational patterns?  Can we expect to see these at
some point in GHC?  Or have they gone by the wayside in favor of some
other alternative?  

When I had finished reading the paper, I was disappointed to find out
that they were not implemented as I was convinced of the merits based
on the arguments presented in the paper.

Thanks,
Pete

[1] http://research.microsoft.com/~simonpj/Papers/pat.htm

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


[Haskell-cafe] Re: Editors for Haskell

2006-06-07 Thread Pete Kazmier
Simon Peyton-Jones [EMAIL PROTECTED] writes:

 You probably know this, but your kind of application is a big reason
 that we now make GHC available as a library.  (Just say 'import GHC'.)  

 You shouldn't need to parse Haskell yourself: just call GHC's parser.
 You get back a syntax tree with very precise location information that
 can guide your editor (e.g. if you want to select a sub-exprssion).
 Similarly, you can call the type checker.

Are there any small examples of using GHC's parser?  I'm a complete
newbie so perhaps I'm not checking all of the relevant locations for
docs, but I can't seem to find this parser that is being referred to.
I checked out the source tree to GHC as well, but I have no idea where
to look in there (not to mention it's a bit intimidating).  Pointers
would be appreciated!

As part of my learning experience, I think I want to see if I can
write a haskell pastebin that does proper syntax highlighting.
Someone in #haskell suggested that I use just a lexer because using a
parser is overkill.  However, I can't make this assessment until I see
how to use the parser and the information it can supply.

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Editors for Haskell

2006-06-07 Thread Pete Kazmier
Pete Kazmier [EMAIL PROTECTED] writes:

 As part of my learning experience, I think I want to see if I can
 write a haskell pastebin that does proper syntax highlighting.
 Someone in #haskell suggested that I use just a lexer because using a
 parser is overkill.  However, I can't make this assessment until I see
 how to use the parser and the information it can supply.

Thanks for the responses and pointers to the other projects.  I'll
investigate those after the day-job (the one that pays the bills).

As for using the lexer vs the parser, I was hoping to do things such
as folding and/or nifty mouse-overs of logical blocks of code, which
is why I was interested in the parser.  I'm not sure if I could do the
same with only a lexer.

I'm basically just looking for something concrete to tinker with as I
learn Haskell and it seems that Haskell is missing a snazzy pastebin.

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe