Beginners Digest, Vol 94, Issue 1

beginners-request Sun, 03 Apr 2016 06:56:10 -0700

Send Beginners mailing list submissions to
        beginners@haskell.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        beginners-requ...@haskell.org


You can reach the person managing the list at
        beginners-ow...@haskell.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1.  How to write faster ByteString/Conduit code (John Ky)
   2. Re:  How to write faster ByteString/Conduit code (John Ky)


----------------------------------------------------------------------

Message: 1
Date: Sun, 03 Apr 2016 13:11:24 +0000
From: John Ky <newho...@gmail.com>
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell <Beginners@haskell.org>
Subject: [Haskell-beginners] How to write faster ByteString/Conduit
        code
Message-ID:
        <CAMB4o-C8bNy9b=itsfqopwsaxg47o6hvxl2m0yp-k+eag_r...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello Haskellers,

I?ve been trying to squeeze as much performance out of my code as possible
and I?ve come to a point where can?t figure out what more I can do.

Here is some example code:

blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
blankEscapedChars = blankEscapedChars' ""

blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit
BS.ByteString m BS.ByteString
blankEscapedChars' rs = do
  mbs <- await
  case mbs of
    Just bs -> do
      let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs
      let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs))
      yield ds
      blankEscapedChars' (BS.drop (BS.length ds) cs)
    Nothing -> when (BS.length rs > 0) (yield rs)
  where
    unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool,
ByteString))
    unescapeByteString (wasEscaped, bs) = case BS.uncons bs of
      Just (_, cs) | wasEscaped       -> Just (wUnderscore, (False, cs))
      Just (c, cs) | c /= wBackslash  -> Just (c, (False, cs))
      Just (c, cs)                    -> Just (c, (True, cs))
      Nothing                         -> Nothing

The above function blankEscapedChars will go find all \ characters and
convert the following character to a _. For a 1 MB in memory JSON ByteString,
it benches at about 6.6 ms

In all my code the basic strategy is the same. await for the next byte
string, then use and unfoldrN to produce a new ByteString for yielding.

Anyone know of a way to go faster?

Cheers,

-John
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.haskell.org/pipermail/beginners/attachments/20160403/80fd9f9f/attachment-0001.html>

------------------------------

Message: 2
Date: Sun, 03 Apr 2016 13:55:41 +0000
From: John Ky <newho...@gmail.com>
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell <Beginners@haskell.org>
Subject: Re: [Haskell-beginners] How to write faster
        ByteString/Conduit code
Message-ID:
        <camb4o-bbpdcpyi7jyx5au3-gxw6p49gebpmst1apmzqfqxt...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Haskellers,

I just rewrote the code to a state-machine in the hope that I can
eventually collapse several stages in a pipeline into one, but this simple
state-machine version turns out to be about 3 times slower even though it
does the same thing:

newtype Blank = Blank
  { blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
  }

escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapeChar bs = case BS.uncons bs of
  Just (c, cs)  -> Just (c, (cs, Blank (if c /= wBackslash then
escapeChar else escapedChar)))
  Nothing       -> Nothing

escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapedChar bs = case BS.uncons bs of
  Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar))
  Nothing      -> Nothing

fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
fastBlank = fastBlank' escapeChar

fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8,
(BS.ByteString, Blank))) -> Conduit BS.ByteString m BS.ByteString
fastBlank' blank = do
  mbs <- await
  case mbs of
    Just bs -> do
      let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs)
(\(bs, Blank f) -> f bs) (bs, Blank blank)
      yield cs
      fastBlank' newBlank
    Nothing -> return ()

I worry that if I go this approach, just the cost of the state-machine
might mean I only break-even.

Is there any reason why this version should be slower?

Cheers,

-John
?

On Sun, 3 Apr 2016 at 23:11 John Ky <newho...@gmail.com> wrote:

> Hello Haskellers,
>
> I?ve been trying to squeeze as much performance out of my code as possible
> and I?ve come to a point where can?t figure out what more I can do.
>
> Here is some example code:
>
> blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
> blankEscapedChars = blankEscapedChars' ""
>
> blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString 
> m BS.ByteString
> blankEscapedChars' rs = do
>   mbs <- await
>   case mbs of
>     Just bs -> do
>       let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs
>       let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs))
>       yield ds
>       blankEscapedChars' (BS.drop (BS.length ds) cs)
>     Nothing -> when (BS.length rs > 0) (yield rs)
>   where
>     unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, 
> ByteString))
>     unescapeByteString (wasEscaped, bs) = case BS.uncons bs of
>       Just (_, cs) | wasEscaped       -> Just (wUnderscore, (False, cs))
>       Just (c, cs) | c /= wBackslash  -> Just (c, (False, cs))
>       Just (c, cs)                    -> Just (c, (True, cs))
>       Nothing                         -> Nothing
>
> The above function blankEscapedChars will go find all \ characters and
> convert the following character to a _. For a 1 MB in memory JSON
> ByteString, it benches at about 6.6 ms
>
> In all my code the basic strategy is the same. await for the next byte
> string, then use and unfoldrN to produce a new ByteString for yielding.
>
> Anyone know of a way to go faster?
>
> Cheers,
>
> -John
> ?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.haskell.org/pipermail/beginners/attachments/20160403/ed64048c/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners


------------------------------

End of Beginners Digest, Vol 94, Issue 1
****************************************

Beginners Digest, Vol 94, Issue 1

Reply via email to