Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-23 Thread Nicolas Pouillard
On Sat, 21 Aug 2010 13:36:08 -0700, John Millikin jmilli...@gmail.com wrote:
 On Sat, Aug 21, 2010 at 12:44, Magnus Therning mag...@therning.org wrote:
  As an aside, has anyone written the code necessary to convert a parser, such
  as e.g.  attoparsec, into an enumerator-iteratee[1]?
 
 This sort of conversion is trivial. For an example, I've uploaded the
 attoparsec-enumerator package at 
 http://hackage.haskell.org/package/attoparsec-enumerator  --
 iterParser is about 20 lines, excluding the module header and imports.

 A.Done extra a - E.yield a (E.Chunks [extra])

Maybe it would be better to check if extra is empty to produce
an empty list of chunks?

-- 
Nicolas Pouillard
http://nicolaspouillard.fr
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-23 Thread John Millikin
After fielding some more questions regarding error handling, it turns
out that my earlier mail was in error (hah) -- error handling is much
more complicated than I thought.

When I gave each iteratee its own error type, I was expecting that
each pipeline would have only one or two sources of errors -- for
example. a parser, or a file reader. However, in reality, it's likely
that every single element in a pipeline can produce an error. For
example, in a JSON/XML/etc reformatter (enumFile, parseEvents,
formatEvents, iterFile), errors could be SomeException, ParseError, or
FormatError.

Futhermore, while it's easy to change an iteratee's error type with
just (e1 - e2), changing an enumerator or enumeratee *also* requires
(e2 - e1). In other words, to avoid loss of error information, the
two types have to be basically the same thing anyway.

I would like to avoid hard-coding the error type to SomeException,
because it forces libraries to use unsafe/unportable language features
(dynamic typing and casting). However, given the apparent practical
requirement that all iteratees have the same error type, it seems like
there's no other choice.

So, my questions:

1. Has anybody here successfully created / used / heard of an iteratee
implementation with independent error types?
2. Do alternative Haskell implementations (JHC, UHC, Hugs, etc)
support DeriveDataTypeable? If not, is there any more portable way to
define exceptions?
3. Has anybody actually written any libraries which use the existing
enumerator error handling API? I don't mind rewriting my own
uploads, since this whole mess is my own damn fault, but I don't want
to inconvenience anybody else.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-23 Thread Michael Snoyman
It's not released yet, but persistent 0.2 is going to be using enumerator. I
personally don't mind SomeException as a hard-coded error type, but go ahead
and do whatever you think is best for the API.

Michael

On Tue, Aug 24, 2010 at 5:47 AM, John Millikin jmilli...@gmail.com wrote:

 After fielding some more questions regarding error handling, it turns
 out that my earlier mail was in error (hah) -- error handling is much
 more complicated than I thought.

 When I gave each iteratee its own error type, I was expecting that
 each pipeline would have only one or two sources of errors -- for
 example. a parser, or a file reader. However, in reality, it's likely
 that every single element in a pipeline can produce an error. For
 example, in a JSON/XML/etc reformatter (enumFile, parseEvents,
 formatEvents, iterFile), errors could be SomeException, ParseError, or
 FormatError.

 Futhermore, while it's easy to change an iteratee's error type with
 just (e1 - e2), changing an enumerator or enumeratee *also* requires
 (e2 - e1). In other words, to avoid loss of error information, the
 two types have to be basically the same thing anyway.

 I would like to avoid hard-coding the error type to SomeException,
 because it forces libraries to use unsafe/unportable language features
 (dynamic typing and casting). However, given the apparent practical
 requirement that all iteratees have the same error type, it seems like
 there's no other choice.

 So, my questions:

 1. Has anybody here successfully created / used / heard of an iteratee
 implementation with independent error types?
 2. Do alternative Haskell implementations (JHC, UHC, Hugs, etc)
 support DeriveDataTypeable? If not, is there any more portable way to
 define exceptions?
 3. Has anybody actually written any libraries which use the existing
 enumerator error handling API? I don't mind rewriting my own
 uploads, since this whole mess is my own damn fault, but I don't want
 to inconvenience anybody else.
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-23 Thread Magnus Therning
On 24/08/10 03:47, John Millikin wrote:
[...]
 I would like to avoid hard-coding the error type to SomeException, because
 it forces libraries to use unsafe/unportable language features (dynamic
 typing and casting). However, given the apparent practical requirement that
 all iteratees have the same error type, it seems like there's no other
 choice.

I haven't worked enough with iteratees to have an informed opinion on this,
but I wonder what the pros and cons are of having an error state in the
iteratees at all.  In other words, why would this

  data Step a m b
  = Continue (Stream a - Iteratee a m b)
  | Yield b (Stream a)
  | Error E.SomeException

be preferred over this

  data Step a m b
  = Continue (Stream a - Iteratee a m b)
  | Yield b (Stream a)

(Maybe with the restriction that m is a MonadError.)

/M

-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus@therning.org   Jabber: magnus@therning.org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-22 Thread John Millikin
On Sat, Aug 21, 2010 at 23:14, Paulo Tanimoto ptanim...@gmail.com wrote:
 One question: enumFile has type

    enumFile :: FilePath - Enumerator SomeException ByteString IO b

 and iterParser has type

    iterParser :: Monad m = Parser a - Iteratee ParseError ByteString m a

 How do we use both together?  Something in these lines won't type-check

    E.run (E.enumFile file E.$$ (E.iterParser p))

 because the error types are different.

Forgot to mention that -- use the mapError function from
enumerator-0.2.1 thusly:

http://ianen.org/haskell/enumerator/api-docs/Data-Enumerator.html#v%3AmapError


parser :: Parser Foo

toExc :: Show a = a - E.SomeException
toExc = E.SomeException . E.ErrorCall . show

main :: IO ()
main = do
run (enumFile parsetest.txt $$ mapError toExc $$ iterParser parser) 
= print


You don't have to map to SomeException -- any type will do. For
example, in a complex pipeline with real error handling at the other
end, you might want a custom error type so you'll know at what stage
the error occurred.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-22 Thread Richard O'Keefe

On Aug 21, 2010, at 4:12 AM, John Millikin wrote:
 This thought occurred to me, but really, how often are you going to
 have a 10 GiB **text** file with no newlines?

When you have a file developed on a system that follows a
different new-line convention.  I haven't seen a file that
big, but I'm sadly used to seeing humanly large files
display as single lines.

Of course if getLine/hGetLine accept *any* of CR, LF, CR+LF
as end-of-line (as opposed to using the platform native
convention), there's no problem.  That's a darned good idea
anyway.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Magnus Therning
On 20/08/10 23:12, John Millikin wrote:
 On Fri, Aug 20, 2010 at 14:58, Magnus Therning mag...@therning.org wrote:
 Indeed.

 In many protocols it would force the attacker to send well-formed requests
 though.  I think this is true for many text-based protocols like
 HTTP.

 The looping can be handled effectively through hWaitForInput.

 There are also other reasons for doing non-blocking IO, not least that it
 makes developing and manual testing a lot nicer.

 I think I'm failing to understand something.

 Using a non-blocking read doesn't change how the iteratees react to
 well- or mal-formed requests. All it does is change the failure
 condition from blocked indefinitely to looping indefinitely.

It changes the timing.  The iteratee will receive the data sooner (when it's
available rather than when the buffer is full).  This means it can fail
*sooner*, in wall-clock time.

 Replacing the hGet with a combination of hWaitForInput /
 hGetNonBlocking would cause a third failure condition, looping
 indefinitely with periodic blocks. This doesn't seem to be an
 improvement over simply blocking.

It is an improvement when data is trickling in.  In other cases it's no
improvement (besides that it'd be possible have time-outs on a lower
level).

 Do you have any example code which works well using a non-blocking
 enumerator, but fails with a blocking one?

It's not about failing vs non-failing, it's about time of failure.  An
example
would be failing after reading a few bytes (the verb of a HTTP request) vs
failing after either reading 4k (which is the buffer size in iteratee, IIRC)
or when the client hangs up.

/M

-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus@therning.org   Jabber: magnus@therning.org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Felipe Lessa
On Sat, Aug 21, 2010 at 5:40 AM, Magnus Therning mag...@therning.org wrote:
 It changes the timing.  The iteratee will receive the data sooner (when it's
 available rather than when the buffer is full).  This means it can fail
 *sooner*, in wall-clock time.

I still fail to see how this works.  So I went to see the sources.

In [1] we can see how hGet and hGetNonBlocking are defined.  The only
difference is that the former uses hGetBuf, and the latter uses
hGetBufNonBlocking.

[1] 
http://hackage.haskell.org/packages/archive/bytestring/0.9.1.7/doc/html/src/Data-ByteString.html#line-1908

hGetBuf's main loop is bufRead [2], while hGetBufNonBlocking's main
loop is bufReadNonBlocking [3].  Both are very similar.  The main
differences are RawIO.read vs RawIO.readNonBlocking [4], and
Buffered.fillReadBuffer vs Buffered.fillReadBuffer0 [5].  Reading
RawIO's documentation [4], we see that RawIO.read blocks only if there
is no data available.  So it doesn't wait for the buffer to be fully
filled, it just returns the available data.  Unfortunately,
BufferedIO's documentation [5] doesn't specify if
Buffered.fillReadBuffer should return the available data without
blocking.  However, it does specify that that it should be blocking
if the are no bytes available.

[2] 
http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-Handle-Text.html#line-820
[3] 
http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-Handle-Text.html#bufReadNonBlocking
[4] 
http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-Device.html#RawIO
[5] 
http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-BufferedIO.html#BufferedIO

So, assuming that the semantics of BufferedIO are the same as RawIO's,
*both* are non-blocking whenever data is already available.  None of
them wait until the buffer is full.  The difference lies in whether
they block if there is no data available.  However, when there isn't
data the enumarator *always* wants to block.  So using non-blocking IO
doesn't give anything, only complicates the code.

Am I misreading the docs/source somewhere?  =)

Cheers!

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin
I think the docs are wrong, or perhaps we're misunderstanding them.
Magnus is correct.

Attached is a test program which listens on two ports, 42000 (blocking
IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
it data. The behavior is as Magnus describes: bytes from
hGetNonBlocking are available immediately, while hGet waits for a full
buffer (or EOF) before returning.

This behavior obviously makes hGet unsuitable for enumHandle; my
apologies for not understanding the problem sooner.
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (forever, unless)
import Control.Monad.Fix (fix)
import qualified Data.ByteString as B
import Network
import System.IO

main :: IO ()
main = do
	blockingSock - listenOn (PortNumber 42000)
	nonblockingSock - listenOn (PortNumber 42001)
	
	forkIO $ acceptLoop B.hGet blockingSock Blocking
	forkIO $ acceptLoop nonblockingGet nonblockingSock Non-blocking
	forever $ threadDelay 100

nonblockingGet :: Handle - Int - IO B.ByteString
nonblockingGet h n = do
	hasInput - catch (hWaitForInput h (-1)) (\_ - return False)
	if hasInput
		then B.hGetNonBlocking h n
		else return B.empty

acceptLoop :: (Handle - Int - IO B.ByteString) - Socket - String - IO ()
acceptLoop get sock label = fix $ \loop - do
	(h, _, _) - accept sock
	putStrLn $ label ++  client connected
	bytesLoop (get h)
	putStrLn $ label ++  EOF
	loop

bytesLoop :: (Int - IO B.ByteString) - IO ()
bytesLoop get = fix $ \loop - do
	bytes - get 20
	unless (B.null bytes) $ do
		putStrLn $ bytes =  ++ show bytes
		loop
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Gregory Collins
John Millikin jmilli...@gmail.com writes:

 I think the docs are wrong, or perhaps we're misunderstanding them.
 Magnus is correct.

 Attached is a test program which listens on two ports, 42000 (blocking
 IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
 it data. The behavior is as Magnus describes: bytes from
 hGetNonBlocking are available immediately, while hGet waits for a full
 buffer (or EOF) before returning.

hSetBuffering handle NoBuffering?

The implementation as it is is fine IMO.

G
-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Judah Jacobson
On Sat, Aug 21, 2010 at 10:58 AM, John Millikin jmilli...@gmail.com wrote:
 I think the docs are wrong, or perhaps we're misunderstanding them.
 Magnus is correct.

 Attached is a test program which listens on two ports, 42000 (blocking
 IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
 it data. The behavior is as Magnus describes: bytes from
 hGetNonBlocking are available immediately, while hGet waits for a full
 buffer (or EOF) before returning.

 This behavior obviously makes hGet unsuitable for enumHandle; my
 apologies for not understanding the problem sooner.

You should note that in ghc=6.12, hWaitForInput tries to decode the
next character of input based on to the Handle's encoding.  As a
result, it will block if the next multibyte sequence is incomplete,
and it will throw an error if a multibyte sequence gets split between
two chunks.

I worked around this problem in Haskeline by temporarily setting stdin
to BinaryMode; you may want to do something similar.

Also, this issue caused a bug in bytestring with ghc-6.12:
http://hackage.haskell.org/trac/ghc/ticket/3808
which will be resolved by the new function 'hGetBufSome' (in ghc-6.14)
that blocks only when there's no data to read:
http://hackage.haskell.org/trac/ghc/ticket/4046
That function might be useful for your package, though not portable to
other implementations or older GHC versions.

Best,
-Judah
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin
On Sat, Aug 21, 2010 at 11:35, Gregory Collins g...@gregorycollins.net wrote:
 John Millikin jmilli...@gmail.com writes:

 I think the docs are wrong, or perhaps we're misunderstanding them.
 Magnus is correct.

 Attached is a test program which listens on two ports, 42000 (blocking
 IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
 it data. The behavior is as Magnus describes: bytes from
 hGetNonBlocking are available immediately, while hGet waits for a full
 buffer (or EOF) before returning.

 hSetBuffering handle NoBuffering?

 The implementation as it is is fine IMO.

Disabling buffering doesn't change the behavior -- hGet h 20 still
doesn't return until the handle has at least 20 bytes of input
available.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin
On Sat, Aug 21, 2010 at 11:58, Judah Jacobson judah.jacob...@gmail.com wrote:
 You should note that in ghc=6.12, hWaitForInput tries to decode the
 next character of input based on to the Handle's encoding.  As a
 result, it will block if the next multibyte sequence is incomplete,
 and it will throw an error if a multibyte sequence gets split between
 two chunks.

 I worked around this problem in Haskeline by temporarily setting stdin
 to BinaryMode; you may want to do something similar.

 Also, this issue caused a bug in bytestring with ghc-6.12:
 http://hackage.haskell.org/trac/ghc/ticket/3808
 which will be resolved by the new function 'hGetBufSome' (in ghc-6.14)
 that blocks only when there's no data to read:
 http://hackage.haskell.org/trac/ghc/ticket/4046
 That function might be useful for your package, though not portable to
 other implementations or older GHC versions.

You should not be reading bytestrings from text-mode handles.

The more I think about it, the more having a single Handle type for
both text and binary data causes problems. There should be some
separation so users don't accidentally use a text handle with binary
functions, and vice-versa:

openFile :: FilePath - IOMode - IO TextHandle
openBinaryFile :: FIlePath - IOMode - IO BinaryHandle
hGetBuf :: BinaryHandle - Ptr a - Int - IO Int
Data.ByteString.hGet :: BinaryHandle - IO ByteString
-- etc

then the enumerators would simply require the correct handle type:

Data.Enumerator.IO.enumHandle :: BinaryHandle - Enumerator
SomeException ByteString IO b
Data.Enumerator.Text.enumHandle :: TextHandle - Enumerator
SomeException Text IO b

I suppose the enumerators could verify the handle mode and throw an
exception if it's incorrect -- at least that way, it will fail
consistently rather than only in rare occasions.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Magnus Therning
On 21/08/10 18:58, John Millikin wrote:
 I think the docs are wrong, or perhaps we're misunderstanding them.
 Magnus is correct.

 Attached is a test program which listens on two ports, 42000 (blocking
 IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
 it data. The behavior is as Magnus describes: bytes from
 hGetNonBlocking are available immediately, while hGet waits for a full
 buffer (or EOF) before returning.

 This behavior obviously makes hGet unsuitable for enumHandle; my
 apologies for not understanding the problem sooner.

Thanks, but I suspect that it was my bad description of the issue that made
understanding the issue more problematic.

Anyway it's good we now understand each other, and even better that we agree
:-)

As an aside, has anyone written the code necessary to convert a parser, such
as e.g.  attoparsec, into an enumerator-iteratee[1]?

/M

[1] Similar to how attoparsec-iteratee does it for iteratee-iteratee.
-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus@therning.org   Jabber: magnus@therning.org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin
On Sat, Aug 21, 2010 at 12:44, Magnus Therning mag...@therning.org wrote:
 As an aside, has anyone written the code necessary to convert a parser, such
 as e.g.  attoparsec, into an enumerator-iteratee[1]?

This sort of conversion is trivial. For an example, I've uploaded the
attoparsec-enumerator package at 
http://hackage.haskell.org/package/attoparsec-enumerator  --
iterParser is about 20 lines, excluding the module header and imports.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Paulo Tanimoto
On Sat, Aug 21, 2010 at 3:36 PM, John Millikin jmilli...@gmail.com wrote:

 This sort of conversion is trivial. For an example, I've uploaded the
 attoparsec-enumerator package at 
 http://hackage.haskell.org/package/attoparsec-enumerator  --
 iterParser is about 20 lines, excluding the module header and imports.

Cool, but is there a reason it won't work with version 0.2 you just released?

  build-depends:
[...]
, enumerator = 0.1   0.2

I noticed that when installing it.

Paulo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin
On Sat, Aug 21, 2010 at 14:17, Paulo Tanimoto ptanim...@gmail.com wrote:
 Cool, but is there a reason it won't work with version 0.2 you just released?

  build-depends:
    [...]
    , enumerator = 0.1   0.2

 I noticed that when installing it.

Hah ... forgot to save the vim buffer. Corrected version uploaded.
Sorry about that.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin
On Sat, Aug 21, 2010 at 14:41, Michael Snoyman mich...@snoyman.com wrote:
 Hey John,
 As I mentioned, I'm considering having persistent depend upon enumerator. Do
 you think it's too early in enumerator's life to do so and I should wait
 till the API stabilizes a bit more? Also, two other packages I would think
 to bring into the enumerator family would be:
 * yaml
 * wai-extra, providing an enumerator layer for more easily dealing with the
 Source and Enumerator datatypes in wai. I might just release a
 wai-enumerator package instead.
 Thanks again for your work on this,
 Michael

I think the API is pretty stable. Most of the significant research
into iteratee-based APIs has already been performed by users of the
iteratee library, and by Oleg. There might be a few
backwards-compatible changes (new modules, new exports, etc). I'm not
planning to make any large changes, such as Mr. Lato's transition to
CPS-based iteratees.

As long as you import the enumerator modules with qualified (to
avoid Prelude name clashes), it should be safe to start porting
libraries.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Paulo Tanimoto
John,

On Sat, Aug 21, 2010 at 5:06 PM, John Millikin jmilli...@gmail.com wrote:

 I think the API is pretty stable. Most of the significant research
 into iteratee-based APIs has already been performed by users of the
 iteratee library, and by Oleg. There might be a few
 backwards-compatible changes (new modules, new exports, etc). I'm not
 planning to make any large changes, such as Mr. Lato's transition to
 CPS-based iteratees.


Apologies if I'm asking you to repeat yourself, but I couldn't find
the explanation.  What was the reason why you went with IterateeM
instead of IterateeMCPS?  Simplicity?

Thanks,

Paulo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin
On Sat, Aug 21, 2010 at 15:35, Paulo Tanimoto ptanim...@gmail.com wrote:
 Apologies if I'm asking you to repeat yourself, but I couldn't find
 the explanation.  What was the reason why you went with IterateeM
 instead of IterateeMCPS?  Simplicity?

Iteratees are difficult enough to understand already -- requiring
prospective users to learn and understand CPS would just be another
roadblock. The CPS implementation is also slower -- I performed some
basic benchmarking of IterateeM.hs and IterateeMCPS.hs, and CPS is
only faster without optimizations. At -O, they are equal, and at -O2,
IterateeM is faster.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Simon Marlow

On 19/08/2010 18:21, John Millikin wrote:

On Wed, Aug 18, 2010 at 23:33, Jason Dagitda...@codersbase.com  wrote:

The main reason I would use iteratees is for performance reasons.  To help
me, as a potential consumer of your library, could you please provide
benchmarks for comparing the performance of enumerator with say, a)
iteratee, b) lazy/strict bytestring, and c) Prelude functions?
I'm interested in both max memory consumption and run-times.  Using
criterion and/or progression to get the run-times would be icing on an
already delicious cake!


Oleg has some benchmarks of his implementation at
http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt, which
clock iteratees at about twice as fast as lazy IO. He also compares
them to a native wc, but his comparison is flawed, because he's
comparing a String iteratee vs byte-based wc.


Handle IO is also doing Unicode encoding/decoding, which iteratees 
bypass.  Have you thought about how to incorporate encoding/decoding?


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin
On Fri, Aug 20, 2010 at 04:01, Simon Marlow marlo...@gmail.com wrote:
 Handle IO is also doing Unicode encoding/decoding, which iteratees bypass.
  Have you thought about how to incorporate encoding/decoding?

Yes; there will be a module Data.Enumerator.Text which contains
locale-based IO, enumeratee-based encoding/decoding, and so forth.
Since iteratee doesn't have any text-based IO, I figured it wasn't
necessary for a first release; getting feedback on the basic soundness
of the package was more important.

Currently, I'm planning on the following type signatures for D.E.Text.
'enumHandle' will use Text's hGetLine, since there doesn't seem to be
any text-based equivalent to ByteString's 'hGet'.



enumHandle :: Handle - Enumerator SomeException Text IO b

enumFile :: FilePath - Enumerator SomeException Text IO b

data Codec = Codec
{ codecName :: Text
, codecEncode :: Text - Either SomeException ByteString
, codecDecode :: ByteString - Either SomeException (Text, ByteString)
}

encode :: Codec - Enumeratee SomeException Text ByteString m b

decode :: Codec - Enumeratee SomeException ByteString Text m b

utf8 :: Codec

utf16le :: Codec

utf16be :: Codec

utf32le :: Codec

utf32be :: Codec

ascii :: Codec

iso8859_1 :: Codec

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Felipe Lessa
On Fri, Aug 20, 2010 at 12:51 PM, John Millikin jmilli...@gmail.com wrote:
 Currently, I'm planning on the following type signatures for D.E.Text.
 'enumHandle' will use Text's hGetLine, since there doesn't seem to be
 any text-based equivalent to ByteString's 'hGet'.

CC'ing text's maintainer.  Using 'hGetLine' will cause baaad surprises
when you process a 10 GiB file with no '\n' in sight.

Cheers! =)

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin
On Fri, Aug 20, 2010 at 08:59, Felipe Lessa felipe.le...@gmail.com wrote:
 On Fri, Aug 20, 2010 at 12:51 PM, John Millikin jmilli...@gmail.com wrote:
 Currently, I'm planning on the following type signatures for D.E.Text.
 'enumHandle' will use Text's hGetLine, since there doesn't seem to be
 any text-based equivalent to ByteString's 'hGet'.

 CC'ing text's maintainer.  Using 'hGetLine' will cause baaad surprises
 when you process a 10 GiB file with no '\n' in sight.

This thought occurred to me, but really, how often are you going to
have a 10 GiB **text** file with no newlines? Remember, this is for
text (log files, INI-style configs, plain .txt), not binary (HTML,
XML, JSON). Off the top of my head, I can't think of any case where
you'd expect to see 10 GiB in a single line.

In the worst case, you can just use decode to process bytes coming
from the ByteString-based enumHandle, which should give nicely chunked
text.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Felipe Lessa
On Fri, Aug 20, 2010 at 1:12 PM, John Millikin jmilli...@gmail.com wrote:
 This thought occurred to me, but really, how often are you going to
 have a 10 GiB **text** file with no newlines? Remember, this is for
 text (log files, INI-style configs, plain .txt), not binary (HTML,
 XML, JSON). Off the top of my head, I can't think of any case where
 you'd expect to see 10 GiB in a single line.

 In the worst case, you can just use decode to process bytes coming
 from the ByteString-based enumHandle, which should give nicely chunked
 text.

I was thinking about an attacker, not a use case.  Think of a web
server accepting queries using iteratees internally.  This may open
door to at least DoS attacks.

And then, we use iteratees because we don't like the unpredictability
of lazy IO.  Why should iteratees be unpredictable when dealing with
Text?  Besides the memory consumption problem, there may be
performance problems if the lines are too short.

Cheers! =)

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin
On Fri, Aug 20, 2010 at 09:30, Felipe Lessa felipe.le...@gmail.com wrote:
 I was thinking about an attacker, not a use case.  Think of a web
 server accepting queries using iteratees internally.  This may open
 door to at least DoS attacks.

Web servers parse/generate HTTP, which is byte-based. They should be
using the bytes-based handle enumerator.

 And then, we use iteratees because we don't like the unpredictability
 of lazy IO.  Why should iteratees be unpredictable when dealing with
 Text?  Besides the memory consumption problem, there may be
 performance problems if the lines are too short.

If you don't want unpredictable performance, use bytes-based IO and
decode it with decode utf8 or something similar.

Text-based IO merely exists to solve the most common case, which is a
small file in local encoding with relatively short ( 200 char) lines.
If you need to handle more complicated cases, such as:

* Files in fixed or self-described encodings (JSON, XML)
* Files with unknown encodings (HTML, RSS)
* Files with content in multiple encodings (EMail)
* Files containing potentially malicious input (such as public server log files)

Then you need to read them as bytes and decide yourself which decoding
is necessary.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Magnus Therning
On 20/08/10 17:30, Felipe Lessa wrote:
 On Fri, Aug 20, 2010 at 1:12 PM, John Millikin jmilli...@gmail.com wrote:
 This thought occurred to me, but really, how often are you going to
 have a 10 GiB **text** file with no newlines? Remember, this is for
 text (log files, INI-style configs, plain .txt), not binary (HTML,
 XML, JSON). Off the top of my head, I can't think of any case where
 you'd expect to see 10 GiB in a single line.

 In the worst case, you can just use decode to process bytes coming
 from the ByteString-based enumHandle, which should give nicely chunked
 text.

 I was thinking about an attacker, not a use case.  Think of a web
 server accepting queries using iteratees internally.  This may open
 door to at least DoS attacks.

You don't need to send that much data, the current implementation of
Enumerator uses hGet, which blocks, so just send the server a few bytes and
it'll be sitting there waiting for input until it times out (if ever).
Open a
few hundred of those connections and you're likely to cause the server
to run
out of FDs.  Of course this is already coded up in tools like
slowloris[1] :-)

/M

[1] http://ha.ckers.org/slowloris/
-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus@therning.org   Jabber: magnus@therning.org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin
On Fri, Aug 20, 2010 at 12:52, Magnus Therning mag...@therning.org wrote:
 You don't need to send that much data, the current implementation of
 Enumerator uses hGet, which blocks, so just send the server a few bytes and
 it'll be sitting there waiting for input until it times out (if ever).
 Open a
 few hundred of those connections and you're likely to cause the server
 to run
 out of FDs.  Of course this is already coded up in tools like
 slowloris[1] :-)

Correct me if I'm wrong, but I'm pretty sure changing the
implementation to something non-blocking like hGetNonBlocking will not
fix this. Hooking up an iteratee to an enumerator which doesn't block
will cause it to loop forever, which is arguably worse than simply
blocking.

The best way I can think of to defeat a handle-exhaustion attack is to
enforce a timeout on HTTP header parsing, using something like
System.Timeout. This protects against slowloris, since requiring the
entire header to be parsed within some fixed small period of time
prevents the socket from being held open via slowly-trickled headers.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Magnus Therning
On 20/08/10 22:32, John Millikin wrote:
 On Fri, Aug 20, 2010 at 12:52, Magnus Therning mag...@therning.org wrote:
 You don't need to send that much data, the current implementation of
 Enumerator uses hGet, which blocks, so just send the server a few bytes and
 it'll be sitting there waiting for input until it times out (if ever).
 Open a few hundred of those connections and you're likely to cause the
 server to run out of FDs.  Of course this is already coded up in tools like
 slowloris[1] :-)

 Correct me if I'm wrong, but I'm pretty sure changing the implementation to
 something non-blocking like hGetNonBlocking will not fix this. Hooking up an
 iteratee to an enumerator which doesn't block will cause it to loop forever,
 which is arguably worse than simply blocking.

 The best way I can think of to defeat a handle-exhaustion attack is to
 enforce a timeout on HTTP header parsing, using something like
 System.Timeout. This protects against slowloris, since requiring the
 entire header to be parsed within some fixed small period of time
 prevents the socket from being held open via slowly-trickled headers.

Indeed.

In many protocols it would force the attacker to send well-formed requests
though.  I think this is true for many text-based protocols like
HTTP.

The looping can be handled effectively through hWaitForInput.

There are also other reasons for doing non-blocking IO, not least that it
makes developing and manual testing a lot nicer.

/M

-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus@therning.org   Jabber: magnus@therning.org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin
On Fri, Aug 20, 2010 at 14:58, Magnus Therning mag...@therning.org wrote:
 Indeed.

 In many protocols it would force the attacker to send well-formed requests
 though.  I think this is true for many text-based protocols like
 HTTP.

 The looping can be handled effectively through hWaitForInput.

 There are also other reasons for doing non-blocking IO, not least that it
 makes developing and manual testing a lot nicer.

I think I'm failing to understand something.

Using a non-blocking read doesn't change how the iteratees react to
well- or mal-formed requests. All it does is change the failure
condition from blocked indefinitely to looping indefinitely.

Replacing the hGet with a combination of hWaitForInput /
hGetNonBlocking would cause a third failure condition, looping
indefinitely with periodic blocks. This doesn't seem to be an
improvement over simply blocking.

Do you have any example code which works well using a non-blocking
enumerator, but fails with a blocking one?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe