[haskell-pipes] Re: Messages delimited by multiple newlines?

Michael Thompson Thu, 18 Aug 2016 17:13:36 -0700

I think as a lens this should reinsert the missing "\n\n",

     endline' :: Monad m => Lens' (Producer Text m a) (Producer Text m 
(Producer Text m a))
     endline' k p0 = fmap (>>= (yield "\n\n" >>)) (k (go p0 ""))   -- 
instead of just `join`


     >>> :set -XOverloadedStrings
     >>> Text.toLazyM $ over endline id $ yield 
"hello\nworld\n\ngoodbye\nworld"
     "hello\nworldgoodbye\nworld"
     >>> Text.toLazyM $ over endline' id $ yield 
"hello\nworld\n\ngoodbye\nworld"
     "hello\nworld\n\ngoodbye\nworld"

The latter is more the desired behavior.

Note that as it stands this silently accumulates everything 
before the double newline. One might try to avoid this, but if 
these are not foreign files it might not be worth worrying about.
If you don't want to accumulate, the thing that is really missing 
is a function like `Data.Text.breakOn` and `Data.Text.splitOn` 
which could break a text stream on a given text shape, here "\n\n" 
I remember trying to implement these, but it is surprisingly 
difficult to do in a non-plodding way. `text` uses an extremely 
complicated, but fast, method that collects a list of all indices 
at which the match text begins.

One thing I wondered is, are you going to repeat this across the 
length of the file? If so, and accumulating lines isn't an issue, 
then one might approach the problem starting by accumulating lines

     >>> :t PG.folds mappend mempty id . view Text.lines   -- I was using 
Pipes.Group = PG; Pipes.Text = Text
     PG.folds mappend mempty id . view Text.lines
      :: Monad m => Producer Text m r -> Producer Text m r

Now we have a producer of separate accumulated lines and can break on an 
empty line.

    >>> let accumLines = PG.folds mappend mempty id . view Text.lines
    >>> let txt = yield "hello\nworld\n\ngoodbye"
    >>> runEffect $ accumLines txt >-> P.print
    "hello"
    "world"
    ""
    "goodbye"

Now we are missing something like a `split :: a -> Producer a m r -> FreeT 
(Producer a m) m r`
which should be in Pipes.Group I think. If all of the above is not 
completely wrong headed,
we could try to write one. It should be pretty simple given 
`Pipes.Parse.span`. But note 
that with `Pipes.Parse.span` we are close to the effect you wanted:

    >>> rest <- runEffect $ accumLines txt  ^. PP.span (/= mempty) >-> 
P.print
    "hello"
    "world"
    *Main
    >>> runEffect $ rest >-> P.print
    ""
    "goodbye"

    >>> runEffect $ rest >-> P.drop 1 >-> P.print

    "goodbye"

You can collect the lines of the first record with `P.toListM'`

     >>> (rec1,rest) <-  P.toListM' $  accumLines txt  ^. PP.span (/= 
mempty) 

     >>> rec1

     ["hello","world"]


Like I said, this may all be wrong-headed and uncomprehending, I'm partly 
just testing

ideas to see what you are intending.

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.

[haskell-pipes] Re: Messages delimited by multiple newlines?

Reply via email to