Re: [MarkLogic Dev General] Information Studio Transform

Colleen Whitney Mon, 25 Oct 2010 19:29:02 -0700

Thanks Shannon, send it along and I'll take a look.  Thank you!

--Colleen


________________________________________
From: [email protected] 
[[email protected]] On Behalf Of Shannon 
[[email protected]]
Sent: Monday, October 25, 2010 7:15 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Information Studio Transform

Colleen, thank you for this informative response. I'm afraid the Fab database 
(great name) is empty. I would like to send you the stylesheet and a truncated 
TEI corpus "offline". Splitting is not the only preprocessing step, 
unfortunately, but I'd be happy to check out Justin's custom Collector for 
up-front splitting, if you think that's the best approach. Thanks very much!

On Oct 24, 2010, at 1:55 PM, Colleen Whitney wrote:

> Hi Shannon,
>
> Here's a little background to help track down what might be happening.
>
> Information Studio uses the Content Processing Framework (CPF) and a 
> dedicated "Fab" database under the covers to do transformations.  If you have 
> transformations configured, the documents that your collector ingests go into 
> the Fab database in a particular directory corresponding with the CPF 
> processing domain for your Flow. Information Studio creates a pipeline for 
> you and attaches it to the domain, including a final step that inserts the 
> document into your destination database.  (If you don't have transformations 
> configured, the documents go straight into your destination database without 
> going through Fab.)
>
> In the pipeline, Information Studio uses a generic XSLT CPF handler that was 
> added as part of XSLT support, and frankly I'm not sure how it handles cases 
> where there are multiple output documents.  Take a look at your Fab database. 
>  Are the documents you expected sitting in there?  What do the URIs look 
> like?  One possibility is that the documents are in Fab, have been assigned 
> URIs outside of the processing domain, so they're not being picked up by CPF 
> and moved through the pipeline and into your target database.  If the 
> documents are not in Fab, then we need to back up and understand what that 
> generic handler is doing with the output documents (so yes, a stylesheet and 
> simple input document would be really helpful).
>
> That said, for this release Flows are optimized for single-document-in, 
> single-document-out, linear pipelines.  Document splitting is a common 
> scenario, and there are a couple of other approaches you could try.  One that 
> I like is to write a custom Collector that does the document splitting up 
> front, before the documents are inserted.  There's a nice example posted to 
> github by Justin Makeig along with his intro video (see the video and the 
> link to the github project at 
> http://developer.marklogic.com/blog/information-studio-intro-video).
>
> If splitting is your only processing step, this has an advantage in that the 
> documents don't need to go through Fab, which is much faster.  And if what 
> you're doing is generic, then it's particularly nice because you can reuse 
> that Collector in many flows, without having to add the transformation step 
> each time.
>
> --Colleen
>
> Colleen Whitney
> MarkLogic Corporation
>
> Phone +1 650 655 2366
> email  [email protected]
> web    www.marklogic.com
>
> This e-mail and any accompanying attachments are confidential. The 
> information is intended solely for the use of the individual to whom it is 
> addressed. Any review, disclosure, copying, distribution, or use of this 
> e-mail communication by others is strictly prohibited. If you are not the 
> intended recipient, please notify us immediately by returning this message to 
> the sender and delete all copies. Thank you for your cooperation.
>
> ________________________________________
> From: [email protected] 
> [[email protected]] On Behalf Of Shannon 
> [[email protected]]
> Sent: Friday, October 22, 2010 8:44 AM
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] Information Studio Transform
>
> Hi MarkLogic,
>
> For a Transformation Step in an InfoStudio Flow, I've inserted a stylesheet 
> as an XSLT Transform, which basically splits a TEI corpus into individual XML 
> documents, and constructs some metadata in a certain namespace for indexing 
> purposes (I can post the thing if that would help)--it works in Oxygen with 
> Saxon-PE 9.2.0.6 as the specified transformer; however, it effectively does 
> nothing as part of the flow editor--only 1 document is reported as loaded, 
> and the split TEI docs don't appear in the database.
>
> Thanks in advance for your help, I'm really looking forward to finding out 
> what I'm doing wrong--moving ahead, thanks to 4.2, content loading with a 
> persistent flow including a preprocessing transform step will really help cut 
> down on the number of steps in my workflow, naturally.
>
> This is with MarkLogic Server, Standard Edition, Personal License, OS X 
> Dev-only, as we haven't yet upgraded our Red Hat Enterprise servers to 4.2.
>
> Thanks,
> Shannon
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Information Studio Transform

Reply via email to