Thanks Shannon, send it along and I'll take a look. Thank you! --Colleen
________________________________________ From: [email protected] [[email protected]] On Behalf Of Shannon [[email protected]] Sent: Monday, October 25, 2010 7:15 PM To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] Information Studio Transform Colleen, thank you for this informative response. I'm afraid the Fab database (great name) is empty. I would like to send you the stylesheet and a truncated TEI corpus "offline". Splitting is not the only preprocessing step, unfortunately, but I'd be happy to check out Justin's custom Collector for up-front splitting, if you think that's the best approach. Thanks very much! On Oct 24, 2010, at 1:55 PM, Colleen Whitney wrote: > Hi Shannon, > > Here's a little background to help track down what might be happening. > > Information Studio uses the Content Processing Framework (CPF) and a > dedicated "Fab" database under the covers to do transformations. If you have > transformations configured, the documents that your collector ingests go into > the Fab database in a particular directory corresponding with the CPF > processing domain for your Flow. Information Studio creates a pipeline for > you and attaches it to the domain, including a final step that inserts the > document into your destination database. (If you don't have transformations > configured, the documents go straight into your destination database without > going through Fab.) > > In the pipeline, Information Studio uses a generic XSLT CPF handler that was > added as part of XSLT support, and frankly I'm not sure how it handles cases > where there are multiple output documents. Take a look at your Fab database. > Are the documents you expected sitting in there? What do the URIs look > like? One possibility is that the documents are in Fab, have been assigned > URIs outside of the processing domain, so they're not being picked up by CPF > and moved through the pipeline and into your target database. If the > documents are not in Fab, then we need to back up and understand what that > generic handler is doing with the output documents (so yes, a stylesheet and > simple input document would be really helpful). > > That said, for this release Flows are optimized for single-document-in, > single-document-out, linear pipelines. Document splitting is a common > scenario, and there are a couple of other approaches you could try. One that > I like is to write a custom Collector that does the document splitting up > front, before the documents are inserted. There's a nice example posted to > github by Justin Makeig along with his intro video (see the video and the > link to the github project at > http://developer.marklogic.com/blog/information-studio-intro-video). > > If splitting is your only processing step, this has an advantage in that the > documents don't need to go through Fab, which is much faster. And if what > you're doing is generic, then it's particularly nice because you can reuse > that Collector in many flows, without having to add the transformation step > each time. > > --Colleen > > Colleen Whitney > MarkLogic Corporation > > Phone +1 650 655 2366 > email [email protected] > web www.marklogic.com > > This e-mail and any accompanying attachments are confidential. The > information is intended solely for the use of the individual to whom it is > addressed. Any review, disclosure, copying, distribution, or use of this > e-mail communication by others is strictly prohibited. If you are not the > intended recipient, please notify us immediately by returning this message to > the sender and delete all copies. Thank you for your cooperation. > > ________________________________________ > From: [email protected] > [[email protected]] On Behalf Of Shannon > [[email protected]] > Sent: Friday, October 22, 2010 8:44 AM > To: General Mark Logic Developer Discussion > Subject: [MarkLogic Dev General] Information Studio Transform > > Hi MarkLogic, > > For a Transformation Step in an InfoStudio Flow, I've inserted a stylesheet > as an XSLT Transform, which basically splits a TEI corpus into individual XML > documents, and constructs some metadata in a certain namespace for indexing > purposes (I can post the thing if that would help)--it works in Oxygen with > Saxon-PE 9.2.0.6 as the specified transformer; however, it effectively does > nothing as part of the flow editor--only 1 document is reported as loaded, > and the split TEI docs don't appear in the database. > > Thanks in advance for your help, I'm really looking forward to finding out > what I'm doing wrong--moving ahead, thanks to 4.2, content loading with a > persistent flow including a preprocessing transform step will really help cut > down on the number of steps in my workflow, naturally. > > This is with MarkLogic Server, Standard Edition, Personal License, OS X > Dev-only, as we haven't yet upgraded our Red Hat Enterprise servers to 4.2. > > Thanks, > Shannon > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
