RE: [MarkLogic Dev General] RE: RE: Creating Collections

Lee, David Sun, 22 Nov 2009 12:16:59 -0800

Thanks for your suggestions,  I will definitely try to do this.
Also it makes distributing XML "documents" easier to people via email
and such if I can send them a couple small files rather then a 500MB
file and say "Look for element xxxx"


Question though.  You say I cant use score with the fragmentation model
? I'm confused.
The document for score seems to imply otherwise.
---- quote ----

cts:score(
[$node as node()]
)  as  xs:integer

 Summary:

Returns the score of a node, or of the context node if no node is
provided.

 Parameters:

$node (optional): A node. Typically this is an item in the result
sequence of a cts:search operation.

 Usage Notes:

Score is computed according to the scoring method specified in the
cts:search expression, if any.

If you run cts:score on a constructed node, it always returns 0; it is
primarily intended to run on nodes that are the retrieved from the
database (an item from a cts:search result or an item from the result of
an XPath expression that searches through the database).


 Example:
(: run this on the Shakespeare content set :)
for $hit in cts:search(//SPEECH,
    cts:word-query("with flowers"))[1 to 10]
return element hit {
  attribute score { cts:score($hit) },
  $hit
}

----------------

This to me seems to imply that cts:score() works on nodes as well as
documents.
I have found using cts:search() that the scoring and sorting of the
return values definately works even when all nodes are found within the
same document.





-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Kelly
Stirman
Sent: Sunday, November 22, 2009 2:57 PM
To: [email protected]
Subject: [MarkLogic Dev General] RE: RE: Creating Collections

David,

Fragmentation is fully supported, and if you like using it you can
continue to do so. However, I think you'll find you have more options,
the server is easy to use, it will be more difficult to make a false
step, and you'll have more in common with other developers if you don't
use fragmentation and instead load your nodes as individual documents.
You may not have run into any limitations thus far, but in my experience
you will eventually.

You also mentioned score, and as I stated in my earlier message, you
won't be able to use score with the approach you have thus far.

I would also make an effort to use the searchAPI - it includes our best
practices for searching XML, and is higher-performance and more scalable
than pretty much any other approach one could develop.

Kelly

Message: 2
Date: Sun, 22 Nov 2009 08:56:11 -0800
From: "Lee, David" <[email protected]>
Subject: RE: [MarkLogic Dev General] RE: RE: Creating Collections
To: "General Mark Logic Developer Discussion"
        <[email protected]>
Message-ID: <dd37f70d78609d4e9587d473fc61e0a714055...@postoffice>
Content-Type: text/plain;       charset="us-ascii"

Good suggestion about separate documents.    In fact these particular
documents are just lists of identical smaller things, just like you
surmise.

You say  

"Now, there is a concept in MarkLogic called fragmentation which allows
you to store very large documents, and to perform minimal disk IO when
retrieving or updating the individual fragments. This is a very useful
feature. However, for search applications, the best practice is to load
the individual nodes as documents. If there is metadata that applies to
all your individual nodes, then we can talk about how you might deal
with that."    


Is this really fundamentally true ?  I hear conflicting statements.
How have you determined this "best practice" ?

I've been using Fragment Parents so that this "big document" is
fragmented into individual fragments, without having to create separate
"documents".
I have no need to apply meta-data to these mini-docs at all. 

Is it really fundamentally true that given the same data set,  that
splitting them into documents, instead of fragments, improves
performance ?
The performance I'm getting is phenomenal,  and I have read implied in
many places in the ML documentation that fragmenting documents is a
great way of doing things.

Besides meta-data associated with each mini-doc,  do I really truly gain
an advantage by splitting the big doc to littler docs ?
That seems contrary to what I'm reading in the ML documentation.     One
of the huge advantages I see with simply storing this mega-document (in
MarkLogic as apposed to my old way of thinking 'file based' XML)  is
that it seems to work perfectly as-is, and it seems to me an unnecessary
complexity to split it up unless there are hard gains to it.  

I can certianly do some tests, but I'd love someone who knows the
authoritative answer, or even hard anecdotal evidence, to comment.


-David
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] RE: RE: Creating Collections

Reply via email to