First off, the disclaimer that I'm not a MarkLogic expert, I'm just learning
myself,
so I welcome anyone who knows more to disagree with me.
That said though, I dont believe queries will be slower or faster based on what
directory structure you use. cts:search() seems to me to perform equally well
regardless how the directory structure is setup. There are many ways of using
it and I'm just learning to scratch the surface.
But examples of what I belive will be equally quick to search are
cts:serch( xdmp:directory( ..) , ... ) -- your original idea
cts:search( //element , ... ) -- search based on an
element name, regardless of the URIS.
cts:search( xdmp:collection(...) ... ) -- limit based on a collection
what seems interesting to me and I'm just barely getting a handle on it is that
you can 're factor' your searches in many different ways with consistent
performance characteristics.
Example
cts:search( //p , cts:and-query( cts:directory-query("dir") ,
cts:word-query("word") ))
This performs in my tests equally well as something like
cts:search( xdmp:directory("dir" ) , cts:element-value-query( ... ))
So I suggest you have a mistaken presumption that organizing things in
directories has any benefit at all in search speed. It has *other* benefits
but searching seems to work well all over the board reguardless of what URI you
assign to documents. Its really amazing actually.
As for the benifits of a RESTful style for organizing the directory tree, based
on patient as the root, the main benifit I suggest is that it becomes an easy
mapping for a web service, if your primary types of queries are about a
particular patient. A client using a restuful approach can (with some help
from URI rewriting rules in the App module) can have what seems a "natural"
view on patient data
/patient/patient_id/ -- maybe combine ALL the sub directory
docuements into 1
/patient/patient_id/lab_tests -- all lab tests
/patient/patient_id/lab_tests/test_123 - a single lab test
etc
It also helps if you map this tree to a WebDAV view ... files are easier to
navigate from a simple file explorer.
Moving, copying, updating, adding or deleting data becomes a directory
operation without having to know anything at all about the structure
(contents,elements etc) of the decomposed files.
The directory structure can be used explicitly to navigate and manipulate data
associated with a patient with no knowledge *at all* about the contents of
files.
This can become extremely convenient when you toss in non-XML files to the mix,
such as say lab XRay images (jpg, gif) ... You can of course assign XML
properties to non-xml files but if you simply put them in a patient oriented
directory structure life is simplified like
/patient/patient_id/lab_tests/test_123/images/
So in conclusion, I suggest you not worry about the efficiency of searching
when deciding on your directory or URI structure, and instead choose a
structure that has advantages based on organizing your data. Searching works
great regardless of your directory structure.
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Karl Erisman
Sent: Saturday, December 19, 2009 6:07 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] RE: Fragmentation planning
Well, since I'm using option (1), the issue is no longer of immediate
concern (option (1) involves single documents containing all types of
patient data, not separate documents for each type). It would become
a concern, however, if performance proves to be inadequate and option
(3) is used to address the issue.
In considering option (3), I was originally thinking of the directory
structure you describe, but I thought that organizing things based on
type of data might make it faster to search across all patients for
specific data by using cts:directory-query() to limit the scope of the
search to the directory storing a particular type of data. But I see
your point about using a directory structure that, in a RESTful sense,
models the patient as the primary resource. I suppose then that the
XML structure of the individual documents would be the facility used
to narrow the scope of the search (i.e.
cts:element-query("demographics", <rest of the query>) assuming that
the demographic data documents have root node <demographics>). I
would expect that to be slower, though (slower than using a
directory-query).
When you say that a structure reflecting the view of patients as the
primary resource has many benefits, are you thinking in terms of
ability to expose the data as a RESTful service? What other benefits
are you thinking of?
Thanks,
Karl
On Sat, Dec 19, 2009 at 9:13 AM, Lee, David <[email protected]> wrote:
> I attended workshop at Balisage 2009 where a developer was modeling very
> similar data,
> HL7 based patient information. In his case I dont think he was using
> MarkLogic, but the structure
> of the data and the rationale I think bear consideration.
> His design used directories for patient data but inverted from your structure.
> This design is more "restful" in the sense that the directory structure
> itself models a aggregate model based around the patient, not the part (lab
> test, info etc). And the URI's follow a left-to-right decomposition of
> document from container to contained.
>
> TO jump to the conclusion, I would suggest a structure
> Not like your suggestion
> /demographics/10291004
> /lab-results/10291004
>
> but rather
>
> /patients/10291004/lab-results/...
> /patients/10291004/demographics/...
>
> You can use Collections if you wish to group all 'lab-results' across all
> patents
> but the primary directory structure is related to the patient, which in a
> patient data model is the
> primary (top level) object and having the directory structure directly
> reflect that has many benefits.
>
>
> ----------------------------------------
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> [email protected]
> 812-482-5224
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general