Yes, the `pipes-group` library is what you need. I recommend reading
this if you haven't done so already:
http://www.haskellforall.com/2013/09/perfect-streaming-using-pipes-bytestring.html
... so the type of `element` would become:
element :: Monad m => Producer Text m r -> FreeT (Producer Text m) m r
The result is a "stream of streams" that preserves the lazy streaming
behavior of each sub-stream so that you don't have to wait to collect
all children before processing them.
There is also another library you should check out which is more
specialized to this particular use case which is the `streaming` library:
https://hackage.haskell.org/package/streaming
On 01/06/2017 05:00 PM, Colin Woodbury wrote:
Hi all, I've hit a problem that feels like it has a straight-forward
answer.
I have a large XML file that I'd like to split up into subfiles of
roughly equal size. My first pass looks like:
|
import qualified Data.Text as T
import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.Prelude.Text as PT
import Pipes.Safe
import Text.Printf.TH
--|Streamslines fromthe source file.
--Thisdrops the first three lines,which are notElements.
xml ::MonadSafem =>FilePath->ProducerT.Textm ()
xml fp =PT.readFileLn fp >->P.drop 3
--|Streaman entire OSM Elementwithall its children (tags,etc).
--Note:OSM XML israther flat -children never have children.
element ::Monadm =>PipeT.Text[T.Text]m ()
element =undefined--`await`lines untilyou find a closing tag,thenpack
asa Listand`yield`?
--|Writesone legal @<osm>...</osm>@block.
osm ::Int->Consumer[T.Text](SafeTIO)()
osm !fpn =do
let fp =[s|catalog/out-%d.osm|]fpn
P.take 1000>->P.concat >->PT.writeFileLn fp
osm $ fpn +1
splitAll ::Effect(SafeTIO)()
splitAll =xml "somefile">->element >->osm 0
|
My intent is to stream groups of 1000 XML elements (and their
children) to separate files. Luckily the XML in question is only ever
one layer deep, like:
|
<foo>
<bar/>
<bar/>
<bar/>
</foo>
|
So this `foo` group here would count as 1 written element, not 5.
What stands out right away is the type signature of `element`. The
`[T.Text]` feels very unidiomatic, but I couldn't think of another way
to group all the parent and child nodes together in such a way that
`osm` would know it had processed 1000 of such groups.
I read the `pipes-group` tutorial, but it wasn't immediately clear to
me if that's what I needed. I /do/ know that at maximum any given
`<foo>` parent can only have a few hundred `<bar>` children, but that
still breaks the output streaming as I wait for the List to populate.
Question: How can I structure things such that `osm` knows when to
start writing to a new file?
Thanks
PS. `osm` defined as it currently is probably also ends up in an
infinite loop.
--
You received this message because you are subscribed to the Google
Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to haskell-pipes+unsubscr...@googlegroups.com
<mailto:haskell-pipes+unsubscr...@googlegroups.com>.
To post to this group, send email to haskell-pipes@googlegroups.com
<mailto:haskell-pipes@googlegroups.com>.
--
You received this message because you are subscribed to the Google Groups "Haskell
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.