I was looking at the Registered Queries and they confuse me a bit. What is the lifetime of a Registered Query ? Can I put the ID's in the DB itself and save them for a long time or are they lost when the session or server ends ?
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Kelly Stirman Sent: Sunday, December 20, 2009 6:18 PM To: [email protected] Subject: [MarkLogic Dev General] RE: Fragmentation planning Some thoughts that may be useful in designing how you organize documents in your database: -Generally speaking, the more restrictive your query, the more efficiently it can be resolved by the server. As a general rule, the time it takes to resolve your query usually is proportional to the number of results rather than the complexity of the query. So, a complex query with many constraints that is highly restrictive and returns few results can be resolved more quickly than a very simple query with few constraints that returns many results. Partitions can be used to make your queries more restrictive, and while they may make your queries more complex, in many cases they can improve the performance of your application. -There exist both physical and logical partitions in MarkLogic. -Forests are physical partitions. Because queries are evaluated in forests in parallel, it is normally best to use the default configuration of the server which spreads documents across forests. This allows MarkLogic to "divide and conquer" the work associated with a query. Of course, you need sufficient hardware to accommodate the parallel work, which is why we typically recommend one forest for every pair of CPU cores. (There are some applications for which designing your own policies around document placement is the right approach, but that should be covered in a separate thread.) -There are many forms of logical partitions. Directories and collections are good examples. They are both very fast for queries and for delete operations. There's no reason not to combine them in your design. Collections are very cheap, so you might consider using several with any document. -XML can be another good was to partition your database, as you have probably found. Using simple structures that are suitable for element-value-query or element-attribute-value-query is one of the best ways to partition with XML. -Document properties are a good way to partition your database. Joins between the document fragment and its property fragment are optimized for simple properties when using cts:properties-query(). This allows you to use XML for partitioning when you cannot control the schema for your documents, or if you're dealing with binary or text documents. -Security is another way to partition you database. Ultimately, security metadata is part of the indexes in a way that is similar to collections. -Registered queries are a remarkably powerful way to partition your database. Registered queries allow you to define a partition based on any cts:query. A registered query is similar to a materialized view, except in this case the materialization only happens in the indexes. Take any complex unfiltered query, register it, and after you pay the costs of running the query the first time, the next time it will be as fast as a simple element-value-query. Plus, the registered query works with updates. I hope some of this helps in your efforts to organize your database. Kelly Message: 2 Date: Sat, 19 Dec 2009 16:41:29 -0800 From: "Lee, David" <[email protected]> Subject: RE: [MarkLogic Dev General] RE: Fragmentation planning To: "General Mark Logic Developer Discussion" <[email protected]> Message-ID: <dd37f70d78609d4e9587d473fc61e0a714ccc...@postoffice> Content-Type: text/plain; charset="iso-8859-1" First off, the disclaimer that I'm not a MarkLogic expert, I'm just learning myself, so I welcome anyone who knows more to disagree with me. That said though, I dont believe queries will be slower or faster based on what directory structure you use. cts:search() seems to me to perform equally well regardless how the directory structure is setup. There are many ways of using it and I'm just learning to scratch the surface. But examples of what I belive will be equally quick to search are cts:serch( xdmp:directory( ..) , ... ) -- your original idea cts:search( //element , ... ) -- search based on an element name, regardless of the URIS. cts:search( xdmp:collection(...) ... ) -- limit based on a collection what seems interesting to me and I'm just barely getting a handle on it is that you can 're factor' your searches in many different ways with consistent performance characteristics. Example cts:search( //p , cts:and-query( cts:directory-query("dir") , cts:word-query("word") )) This performs in my tests equally well as something like cts:search( xdmp:directory("dir" ) , cts:element-value-query( ... )) So I suggest you have a mistaken presumption that organizing things in directories has any benefit at all in search speed. It has *other* benefits but searching seems to work well all over the board reguardless of what URI you assign to documents. Its really amazing actually. As for the benifits of a RESTful style for organizing the directory tree, based on patient as the root, the main benifit I suggest is that it becomes an easy mapping for a web service, if your primary types of queries are about a particular patient. A client using a restuful approach can (with some help from URI rewriting rules in the App module) can have what seems a "natural" view on patient data /patient/patient_id/ -- maybe combine ALL the sub directory docuements into 1 /patient/patient_id/lab_tests -- all lab tests /patient/patient_id/lab_tests/test_123 - a single lab test etc It also helps if you map this tree to a WebDAV view ... files are easier to navigate from a simple file explorer. Moving, copying, updating, adding or deleting data becomes a directory operation without having to know anything at all about the structure (contents,elements etc) of the decomposed files. The directory structure can be used explicitly to navigate and manipulate data associated with a patient with no knowledge *at all* about the contents of files. This can become extremely convenient when you toss in non-XML files to the mix, such as say lab XRay images (jpg, gif) ... You can of course assign XML properties to non-xml files but if you simply put them in a patient oriented directory structure life is simplified like /patient/patient_id/lab_tests/test_123/images/ So in conclusion, I suggest you not worry about the efficiency of searching when deciding on your directory or URI structure, and instead choose a structure that has advantages based on organizing your data. Searching works great regardless of your directory structure. _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
