Hi Dave,
Interesting work! I used xsl:result-document to do something similar, which went wrong for the same reasons. I commented on the blog a different solution that (essentially) helps both approaches. I am also investigating the possibilities of splitting in the collector, which seems to most sensible place to do so. But my testcase fails. Perhaps 550k records is a bit too much to start with.. ;) Kind regards, Geert *Van:* general-boun...@developer.marklogic.com [ma ilto:general-boun...@developer.marklogic.com] *Namens *Dave Cassel *Verzonden:* dinsdag 24 januari 2012 13:35 *Aan:* General MarkLogic Developer Discussion *Onderwerp:* Re: [MarkLogic Dev General] Using Information Studio to split uploaded files.. Late to the party, but I'll put it out there in case it's still helpful: http://blog.davidcassel.net/2011/06/splitting-data-with-info-studio/ On Jan 17, 2012, at 2:31 PM, Michael Sokolov wrote: Here's my two cents; I hope it helps with the development of IS. We have typically been doing this kind of splitting external to MarkLogic in xslt or in java (with a sax parser) for very large documents we need to stream. Generally speaking, an xpath can describe the boundaries where we want to split the original document - often an element name or name/attribute-value combination would be enough. One difficulty in the streaming case has been the need to maintain outer context when splitting inner elements. For example, consider a book document where you want to split on book parts; a book-part can be the whole book, a part, chapter, section, etc. In a hierarchical structure you mostly just want the part you're looking at, but also need to preserve some outer metadata and/or structure as well - for example you might like to include the book title in every part of the book so that you can display that later. Other typical requirements are to generate a TOC and to maintain next/previous links between parts. One approach that has helped us is to generate an intermediate document including the part wrapped in its ancestors' descendant content *until the next part boundary*. Actually in the streaming case you can only include ancestor descendant content that precedes the current chunk, but since metadata typically precedes content, that seems to work out OK. I would encourage you to consider providing or at least enabling some solution to this ancestor-metadata problem as a requirement in any document-splitting pipeline. -Mike On 1/13/2012 11:26 AM, Justin Makeig wrote: Geert, Information Studio is currently designed for single document in, single document out transformations. Your best bet for splitting a document today is to do this as part of the collection step. Can you tell me a little more about the data you’d like to split? Is it aggregated XML that you’re splitting on an XPath-like match expression? Text separated by line breaks? Something else? I’m interested in figuring out if and how we might make splitting easier and better integrated into the product. Justin Justin Makeig Senior Product Manager MarkLogic Corporation justin.mak...@marklogic.com Phone: +1 650 655 2387 www.marklogic.com On Jan 13, 2012, at 6:35 AM, Geert Josten wrote: Hi, Is Information Studio intended to allow splitting of uploaded files? If so, what is the best way of handling that? I was experimenting with a custom XSLT, and a simple xsl:result-document, but that is giving funny results. Mostly http://marklogic.com/states/appservices/distribute-error messages in the errorlog, not sure what they exactly mean, but I can imagine it is because CPF handling is 'violated' or something.. Any suggestions? Kind regards, Geert drs. G.P.H. (Geert) Josten Senior Developer Dayon B.V. Delftechpark 37b 2628 XJ Delft T +31 (0)88 26 82 570 geert.jos...@dayon.nl www.dayon.nl De informatie - verzonden in of met dit e-mailbericht - is afkomstig van Dayon BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen rechten worden ontleend. _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general * David Cassel* dave.cas...@marklogic.com Sr. Federal Consultant MarkLogic Corporation <http://marklogic.com>
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general