David,

Fragmentation is fully supported, and if you like using it you can continue to 
do so. However, I think you'll find you have more options, the server is easy 
to use, it will be more difficult to make a false step, and you'll have more in 
common with other developers if you don't use fragmentation and instead load 
your nodes as individual documents. You may not have run into any limitations 
thus far, but in my experience you will eventually.

You also mentioned score, and as I stated in my earlier message, you won't be 
able to use score with the approach you have thus far.

I would also make an effort to use the searchAPI - it includes our best 
practices for searching XML, and is higher-performance and more scalable than 
pretty much any other approach one could develop.

Kelly

Message: 2
Date: Sun, 22 Nov 2009 08:56:11 -0800
From: "Lee, David" <[email protected]>
Subject: RE: [MarkLogic Dev General] RE: RE: Creating Collections
To: "General Mark Logic Developer Discussion"
        <[email protected]>
Message-ID: <dd37f70d78609d4e9587d473fc61e0a714055...@postoffice>
Content-Type: text/plain;       charset="us-ascii"

Good suggestion about separate documents.    In fact these particular
documents are just lists of identical smaller things, just like you surmise.

You say  

"Now, there is a concept in MarkLogic called fragmentation which allows you to 
store very large documents, and to perform minimal disk IO when retrieving or 
updating the individual fragments. This is a very useful feature. However, for 
search applications, the best practice is to load the individual nodes as 
documents. If there is metadata that applies to all your individual nodes, then 
we can talk about how you might deal
with that."    


Is this really fundamentally true ?  I hear conflicting statements.
How have you determined this "best practice" ?

I've been using Fragment Parents so that this "big document" is fragmented into 
individual fragments, without having to create separate "documents".
I have no need to apply meta-data to these mini-docs at all. 

Is it really fundamentally true that given the same data set,  that splitting 
them into documents, instead of fragments, improves performance ?
The performance I'm getting is phenomenal,  and I have read implied in many 
places in the ML documentation that fragmenting documents is a great way of 
doing things.

Besides meta-data associated with each mini-doc,  do I really truly gain an 
advantage by splitting the big doc to littler docs ?
That seems contrary to what I'm reading in the ML documentation.     One
of the huge advantages I see with simply storing this mega-document (in 
MarkLogic as apposed to my old way of thinking 'file based' XML)  is that it 
seems to work perfectly as-is, and it seems to me an unnecessary complexity to 
split it up unless there are hard gains to it.  

I can certianly do some tests, but I'd love someone who knows the authoritative 
answer, or even hard anecdotal evidence, to comment.


-David
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to