Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-07 Thread Simon Meier
2011/6/6 Bryan O'Sullivan b...@serpentine.com:
 On Sun, Jun 5, 2011 at 11:00 AM, Yitzchak Gale g...@sefer.org wrote:

 If behind the scenes the concat is copying directly from slices of the
 original
 input, then no, in principle we're not saving much then.
 I thought there were *two* copies going on.

 If you're using the specialised functions like attoparsec's takeWhile, then
 all they do is return a view into the underlying array. No copying occurs
 until the concat itself. Now that I think of it: in principle, you could
 write a specialised concat that would check the pointer/offset/length
 combinations of its arguments and, if they all abutted perfectly, would just
 return a new view into that same array, sans copying. (You'd have to hide it
 behind unsafePerformIO, of course.)

Why would you need 'unsafePerformIO'. You can scrutinise the 'PS'
constructors of the slice without dropping down to IO. The required
'Eq' instance on 'ForeignPtr' is also present.

Using a Builder for concatentation makes sense, if you want to exploit
that copying a slice of the input array is cheaper right after it has
been inspected (its fully cached) than later (as it is done when
collecting slices in a list). However, you can only have one Builder
at a time and some low-level meddling is probably required to
interleave the feeding of the Parser with input arrays with the
feeding of the Builder with free buffers. Nevertheless, for something
like parsing Chunked HTTP content it would make a lot of sense. I'm
inclined look into that once I finished porting the blaze Builder to
the 'bytestring' library.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-07 Thread Yitzchak Gale
Bryan O'Sullivan wrote:
 Now that I think of it: in principle, you could
 write a specialised concat that would check the pointer/offset/length
 combinations of its arguments and, if they all abutted perfectly, would just
 return a new view into that same array, sans copying.

Gregory Collins wrote:
 The blaze-builder might work for this also, this is exactly the
 problem it's designed for.

Simon Meier wrote:
 Using a Builder for concatentation makes sense...
 However, ...some low-level meddling is probably required...
 I'm inclined look into that...

These are great ideas for further exploiting the
bytestring/text specialization of this parser library for
super speed. Thanks!

Regards,
Yitz

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-07 Thread Bryan O'Sullivan
On Tue, Jun 7, 2011 at 1:40 AM, Simon Meier iridc...@gmail.com wrote:

 Why would you need 'unsafePerformIO'. You can scrutinise the 'PS'
 constructors of the slice without dropping down to IO.


True. Oops :-)


 Using a Builder for concatentation makes sense, if you want to exploit
 that copying a slice of the input array is cheaper right after it has
 been inspected (its fully cached) than later (as it is done when
 collecting slices in a list).


When I've measured this in the past, I've found that it's often faster to
accumulate a list and then run concat at the end than to use blaze-builder
directly. That was certainly the case wit GHC 6.12; I haven't remeasured
with 7.0. That's why you'll see that some places in the aeson JSON library
use blaze-builder, while others manipulate bytestrings directly.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-07 Thread Simon Meier
2011/6/7 Bryan O'Sullivan b...@serpentine.com:
 On Tue, Jun 7, 2011 at 1:40 AM, Simon Meier iridc...@gmail.com wrote:

 Why would you need 'unsafePerformIO'. You can scrutinise the 'PS'
 constructors of the slice without dropping down to IO.

 True. Oops :-)


 Using a Builder for concatentation makes sense, if you want to exploit
 that copying a slice of the input array is cheaper right after it has
 been inspected (its fully cached) than later (as it is done when
 collecting slices in a list).

 When I've measured this in the past, I've found that it's often faster to
 accumulate a list and then run concat at the end than to use blaze-builder
 directly. That was certainly the case wit GHC 6.12; I haven't remeasured
 with 7.0. That's why you'll see that some places in the aeson JSON library
 use blaze-builder, while others manipulate bytestrings directly.

When creating a Builder that you run afterwards, then you essentially
create a list of bytestring concatenations as a closure. It makes
sense that you don't win with such an approach, as the concatenation
still only happens after then end of this list is reached. What you'd
need is to nest attoparsec's Parser monad with the 'Put' monad
provided in the blaze-builder internals [1]. However, I'm not yet sure
how to achieve this in a modular fashion. Perhaps, using iteratee's
the right way might provide a good answer, but that's just guesswork.

The performance characteristics of blaze-builder are such that for
short output sequences working with bytestrings directly is sometimes
favorable, as the setup overhead before the Builder can be executed
needs to be amortized. However, Builders work with a single pass
through the input data, while many bytestring operations require two
passes. For example, 'Data.ByteString.pack' first determines the
length of the input and only afterwards writes the data to memory.
Using blaze-builder's 'fromWord8s' requires only a single traversal
and is faster for lists of length 64 on my 32-bit Core 2 Duo machine.
I expect a similar effect for concatenating lists of strict
bytestrings.

[1] 
http://hackage.haskell.org/packages/archive/blaze-builder/0.3.0.1/doc/html/Blaze-ByteString-Builder-Internal.html#g:3

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-06 Thread Bryan O'Sullivan
On Sun, Jun 5, 2011 at 11:00 AM, Yitzchak Gale g...@sefer.org wrote:

 If behind the scenes the concat is copying directly from slices of the
 original
 input, then no, in principle we're not saving much then.
 I thought there were *two* copies going on.


If you're using the specialised functions like attoparsec's takeWhile, then
all they do is return a view into the underlying array. No copying occurs
until the concat itself. Now that I think of it: in principle, you could
write a specialised concat that would check the pointer/offset/length
combinations of its arguments and, if they all abutted perfectly, would just
return a new view into that same array, sans copying. (You'd have to hide it
behind unsafePerformIO, of course.)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-05 Thread Yitzchak Gale
I wrote:
 I was thinking of even lower level: allocating a moderate chunk of
 memory and writing the results directly into it consecutively as a
 special case.

Bryan O'Sullivan wrote:
 Surely that would save only one copy compared to creating a list of results
 and then concatenating them, no? I'd be a little surprised if it proved
 worthwhile.

If behind the scenes the concat is copying directly from slices of the original
input, then no, in principle we're not saving much then.
I thought there were *two* copies going on.

It might be possible to keep the byte count only in the special
case of a concatenating combinator, but that would require
some work to implement.

Thanks as usual for the fantastic work,
Yitz

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-03 Thread Yitzchak Gale
Bryan O'Sullivan wrote:
 I'd like a no-copy combinator for the same reasons, but I think it's
 impossible to do without some low-level support.

I wrote:
 ...does the internal representation easily admit such a combinator?

 Not very easily. Internally, attoparsec maintains just three pieces of data
 for its state... If
 there was a bytes consumed counter, it would be possible to write a
 try-like combinator

I was thinking of even lower level: allocating a moderate chunk of
memory and writing the results directly into it consecutively as a
special case.

I think Data.ByteString.Internal.create might do the trick.
In fact, some of the existing basic attoparsec combinators,
like takeWhile, could use that kind of treatment. The question
is whether you want to dip that low into the ByteString
implementation.

Part of the problem is that there doesn't seem to be any way
to allocate contiguous memory in GHC and then release only
part of it. So even ByteString itself is doing extra copying.
That is another reason why I think there may be some serious
performance gains to be had by exposing those internals in
attoparsec.

[Duncan: Did you notice that the Haddocks for
Data.ByteString.Internals and a few others haven't
been building lately?]

Thanks,
Yitz

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-03 Thread Yitzchak Gale
Mario Blažević wrote:
  I don't know if this helps, but the incremental-parser library has
 exactly the combinator you're looking for.

Wow, that is a beautiful implementation of a general parser
library. So much simpler than Parsec. Thanks for pointing it out.

Why are you hiding those nice Monoid classes in the parser
package? Shouldn't it be a separate package?

Edward Kmett has also been adding some nice Monoid
abstractions lately. I haven't been following all of it. I wonder
how yours and his relate.

Thanks,
Yitz

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-03 Thread Mario Blažević

On 11-06-03 06:00 AM, Yitzchak Gale wrote:

Mario Blažević wrote:

  I don't know if this helps, but the incremental-parser library has
exactly the combinator you're looking for.


Wow, that is a beautiful implementation of a general parser
library. So much simpler than Parsec. Thanks for pointing it out.


Thanks. I guess I should get to work fixing its deficiencies then.



Why are you hiding those nice Monoid classes in the parser
package? Shouldn't it be a separate package?


	I considered it, and I'll do it if there's interest, but for the first 
release I decided to keep them close to where they're needed. There was 
less work that way. I'd hate to release a standalone package with only 
half the instance implementations.




Edward Kmett has also been adding some nice Monoid
abstractions lately. I haven't been following all of it. I wonder
how yours and his relate.


	If you mean semigroups, they are related but only through Monoid. 
Semigroup is a superclass of Monoid, whereas CancellativeMonoid and 
FunctorialMonoid are its subclasses. CancellativeMonoid is lying between 
a Monoid and Group in power. It would make more sense to merge my 
classes into some group package, though I don't see any obvious 
candidate on Hackage right now.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-03 Thread Gregory Collins
On Fri, Jun 3, 2011 at 11:52 AM, Yitzchak Gale g...@sefer.org wrote:
 Bryan O'Sullivan wrote:
 I'd like a no-copy combinator for the same reasons, but I think it's
 impossible to do without some low-level support.

 I wrote:
 ...does the internal representation easily admit such a combinator?

 Not very easily. Internally, attoparsec maintains just three pieces of data
 for its state... If
 there was a bytes consumed counter, it would be possible to write a
 try-like combinator

 I was thinking of even lower level: allocating a moderate chunk of
 memory and writing the results directly into it consecutively as a
 special case.

The blaze-builder might work for this also, this is exactly the
problem it's designed for.

G
-- 
Gregory Collins g...@gregorycollins.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-03 Thread Bryan O'Sullivan
On Fri, Jun 3, 2011 at 2:52 AM, Yitzchak Gale g...@sefer.org wrote:

 I was thinking of even lower level: allocating a moderate chunk of
 memory and writing the results directly into it consecutively as a
 special case.


Surely that would save only one copy compared to creating a list of results
and then concatenating them, no? I'd be a little surprised if it proved
worthwhile.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Attoparsec concatenating combinator

2011-06-02 Thread Yitzchak Gale
I often find while using attoparsec and attoparsec-text that I need to
match a number of text parsers consecutively and concatenate the
result. By text parser I mean Parser ByteString for attoparsec and
Parser Text for attoparsec-text.

It seems the best I can do is to collect them all in a list and then
apply concat. But that still copies the text several times.

Is there a combinator that does this without all that copying?
If not, does the internal representation easily admit such
a combinator?

Thanks,
Yitz

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-02 Thread Bryan O'Sullivan
On Thu, Jun 2, 2011 at 7:02 AM, Yitzchak Gale g...@sefer.org wrote:

 It seems the best I can do is to collect them all in a list and then
 apply concat. But that still copies the text several times.


Right. I'd like a no-copy combinator for the same reasons, but I think it's
impossible to do without some low-level support.


 If not, does the internal representation easily admit such a combinator?


Not very easily. Internally, attoparsec maintains just three pieces of data
for its state:

   - The current input
   - Any input we received via continuations, in case we need to backtrack
   - A flag that denotes whether we were fed EOF by a continuation

There are no line numbers, no bytes consumed counters, nothing else. If
there was a bytes consumed counter, it would be possible to write a
try-like combinator that would hold onto the current input, run a parser,
tack on any input received via continuations to the original input, and then
use the counter to slice off a portion of that bytestring without copying. I
can't think of another way to do it. Adding that counter would be a moderate
amount of work, and would presumably have a negative effect on performance.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Attoparsec concatenating combinator

2011-06-02 Thread Mario Blažević
On Thu, Jun 2, 2011 at 10:02 AM, Yitzchak Gale g...@sefer.org wrote:
 I often find while using attoparsec and attoparsec-text that I need to
 match a number of text parsers consecutively and concatenate the
 result. By text parser I mean Parser ByteString for attoparsec and
 Parser Text for attoparsec-text.

   I don't know if this helps, but the incremental-parser library has
exactly the combinator you're looking for.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe