David, Fragmentation is fully supported, and if you like using it you can continue to do so. However, I think you'll find you have more options, the server is easy to use, it will be more difficult to make a false step, and you'll have more in common with other developers if you don't use fragmentation and instead load your nodes as individual documents. You may not have run into any limitations thus far, but in my experience you will eventually.
You also mentioned score, and as I stated in my earlier message, you won't be able to use score with the approach you have thus far. I would also make an effort to use the searchAPI - it includes our best practices for searching XML, and is higher-performance and more scalable than pretty much any other approach one could develop. Kelly Message: 2 Date: Sun, 22 Nov 2009 08:56:11 -0800 From: "Lee, David" <[email protected]> Subject: RE: [MarkLogic Dev General] RE: RE: Creating Collections To: "General Mark Logic Developer Discussion" <[email protected]> Message-ID: <dd37f70d78609d4e9587d473fc61e0a714055...@postoffice> Content-Type: text/plain; charset="us-ascii" Good suggestion about separate documents. In fact these particular documents are just lists of identical smaller things, just like you surmise. You say "Now, there is a concept in MarkLogic called fragmentation which allows you to store very large documents, and to perform minimal disk IO when retrieving or updating the individual fragments. This is a very useful feature. However, for search applications, the best practice is to load the individual nodes as documents. If there is metadata that applies to all your individual nodes, then we can talk about how you might deal with that." Is this really fundamentally true ? I hear conflicting statements. How have you determined this "best practice" ? I've been using Fragment Parents so that this "big document" is fragmented into individual fragments, without having to create separate "documents". I have no need to apply meta-data to these mini-docs at all. Is it really fundamentally true that given the same data set, that splitting them into documents, instead of fragments, improves performance ? The performance I'm getting is phenomenal, and I have read implied in many places in the ML documentation that fragmenting documents is a great way of doing things. Besides meta-data associated with each mini-doc, do I really truly gain an advantage by splitting the big doc to littler docs ? That seems contrary to what I'm reading in the ML documentation. One of the huge advantages I see with simply storing this mega-document (in MarkLogic as apposed to my old way of thinking 'file based' XML) is that it seems to work perfectly as-is, and it seems to me an unnecessary complexity to split it up unless there are hard gains to it. I can certianly do some tests, but I'd love someone who knows the authoritative answer, or even hard anecdotal evidence, to comment. -David _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
