Re: [MarkLogic Dev General] Information Studio Transform

Colleen Whitney Sun, 24 Oct 2010 10:57:04 -0700

Hi Shannon,

Here's a little background to help track down what might be happening.


Information Studio uses the Content Processing Framework (CPF) and a dedicated 
"Fab" database under the covers to do transformations.  If you have 
transformations configured, the documents that your collector ingests go into 
the Fab database in a particular directory corresponding with the CPF 
processing domain for your Flow. Information Studio creates a pipeline for you 
and attaches it to the domain, including a final step that inserts the document 
into your destination database.  (If you don't have transformations configured, 
the documents go straight into your destination database without going through 
Fab.)

In the pipeline, Information Studio uses a generic XSLT CPF handler that was 
added as part of XSLT support, and frankly I'm not sure how it handles cases 
where there are multiple output documents.  Take a look at your Fab database.  
Are the documents you expected sitting in there?  What do the URIs look like?  
One possibility is that the documents are in Fab, have been assigned URIs 
outside of the processing domain, so they're not being picked up by CPF and 
moved through the pipeline and into your target database.  If the documents are 
not in Fab, then we need to back up and understand what that generic handler is 
doing with the output documents (so yes, a stylesheet and simple input document 
would be really helpful).

That said, for this release Flows are optimized for single-document-in, 
single-document-out, linear pipelines.  Document splitting is a common 
scenario, and there are a couple of other approaches you could try.  One that I 
like is to write a custom Collector that does the document splitting up front, 
before the documents are inserted.  There's a nice example posted to github by 
Justin Makeig along with his intro video (see the video and the link to the 
github project at 
http://developer.marklogic.com/blog/information-studio-intro-video). 

If splitting is your only processing step, this has an advantage in that the 
documents don't need to go through Fab, which is much faster.  And if what 
you're doing is generic, then it's particularly nice because you can reuse that 
Collector in many flows, without having to add the transformation step each 
time.

--Colleen

Colleen Whitney
MarkLogic Corporation

Phone +1 650 655 2366
email  [email protected]
web    www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.

________________________________________
From: [email protected] 
[[email protected]] On Behalf Of Shannon 
[[email protected]]
Sent: Friday, October 22, 2010 8:44 AM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] Information Studio Transform

Hi MarkLogic,

For a Transformation Step in an InfoStudio Flow, I've inserted a stylesheet as 
an XSLT Transform, which basically splits a TEI corpus into individual XML 
documents, and constructs some metadata in a certain namespace for indexing 
purposes (I can post the thing if that would help)--it works in Oxygen with 
Saxon-PE 9.2.0.6 as the specified transformer; however, it effectively does 
nothing as part of the flow editor--only 1 document is reported as loaded, and 
the split TEI docs don't appear in the database.

Thanks in advance for your help, I'm really looking forward to finding out what 
I'm doing wrong--moving ahead, thanks to 4.2, content loading with a persistent 
flow including a preprocessing transform step will really help cut down on the 
number of steps in my workflow, naturally.

This is with MarkLogic Server, Standard Edition, Personal License, OS X 
Dev-only, as we haven't yet upgraded our Red Hat Enterprise servers to 4.2.

Thanks,
Shannon
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Information Studio Transform

Reply via email to