Re: [MarkLogic Dev General] Processing Large Documents?

Michael Blakeley Mon, 20 Feb 2012 09:15:07 -0800

Ignore forests and stands for now. Those are physical storage artifacts, 
completely orthogonal to collections.

One difference you may note to existdb is that a document can be in many 
collections at the same time. As I understand it, existdb collections act sort 
of like filesystem directories. MarkLogic treats them more like tags, and uses 
document URIs like '/a/b/c.xml' to provide hierarchies above the document level.

Have you considered stepping back a bit and doing more denormalization work in 
SQL? Could you generate a relational view that represents your denormalized 
document structure, at least approximately? If so, may find that XML easier to 
import into MarkLogic - either with InfoStudio or with 
http://marklogic.github.com/recordloader/

MarkLogic does have support for includes: 
http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/dev_guide/mod-docs.xml
 - but I think you would be better served to export a denormalized view and 
work from that. SQL has fairly good syntax for denormalization, while the 
XQuery data model works best with XML that is already denormalized.

You might also find 
http://resources.marklogic.com/library/media/inside-marklogic helpful. Google 
has indexed a copy of what looks like the same paper at 
http://www.odbms.org/download/inside-marklogic-server.pdf too.

-- Mike

On 20 Feb 2012, at 07:53 , Todd Gochenour wrote:

> Day three.   President's day.   I will first chunk the data for each row as 
> this will improve concurrency.  I gather I will need to generate random 
> document names for each chunk and put these documents in a collection using 
> the name of the database as the folder name.    I see the terms Forest and 
> Stands.    I assume this is new terminology for collections.  With eXistDB, 
> my queries work across collection/subcollection boundaries transparently.  
> I'm assuming this is true with stands in a forest, so my document will be 
> found no matter which stand it resides in.  
>  
> Second I will de-normalize the data so as to convert primary/foreign key 
> relationships into structure.  The database has a naming convention which I 
> can exploit to automate this task (i.e. path /xyz/usr_id maps to /usr/id).   
> For relationships between primary documents, I will research MarkLogic's 
> linking.  I never manged to get <xs:include/> to work like I wanted in 
> eXistDB and so I had to perform these joins programatically in XQuery.  
> Perhaps MarkLogic has a clever way to do this automatically.   We will see.
>  
> Thanks Geert and Damon for keeping me honest.
>  
> Yours,
> Todd
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Processing Large Documents?

Reply via email to