Thanks Geert. You described more or less what I had done. With
directory-creation=manual and maintain-directory-last-modified=false
(but maintain-last-modified=true), things seemed to run quite quickly,
without the contention I observed before. The tricky part is updating
the directory modification times manually. The only reliable solution I
have for that is to set a dummy property on the directory property fragment.
-Mike
On 6/4/2014 4:29 AM, Geert Josten wrote:
Hi Mike,
The automatic dir creation will cause MarkLogic to have to check for
dir existance for each doc, for every parent directory of that doc.
That certainly slows down your system. Running a separate dir creation
process before the ingest, with just a dir creation of each dir yet
missing, will certainly speed up that bit. You can take an intersect
of cts:uri-match("*/") with the ones you need to find which need to be
created. Sort the dir uris you need to create, and you can run
straight through them from top to bottom.
I'm afraid though that having MarkLogic maintain last-modified on dirs
will still cause retention. That will also slow down your ingest. But
maybe that overhead is much smaller. It will probably help if you can
make your ingest batch up per directory..
Kind regards,
Geert
*Van:*[email protected]
[mailto:[email protected]] *Namens *Michael Sokolov
*Verzonden:* vrijdag 23 mei 2014 14:04
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] best practices for manual
directory creation
I saw an enormous performance improvement by turning off automatic
directory creation in 7.0-2.3. I think the problem I was seeing is
the one that Mike Blakeley documented here:
http://blakeley.com/blogofile/2012/03/19/directory-assistance/. I'm
able to work around that problem by using a fast-to-update lock-free
data structure on the client side to track all the new directories and
create them in batches using xdmp:directory-create. I also have a
solution for managing the last-updated time for the directories,
although it's a little cheesy. It's possible there is a way to
accomplish this using the various automatic solutions, but based on my
experience, I don't think it will perform as well.
I agree that it's a lot of work to do this, and undoubtedly there are
holes in our system where we perform document updates and inserts in
XQuery code that we won't be managing, so it's not something I would
recommend as a general solution to folks using MarkLogic in a "normal"
way.
-Mike
On 5/22/2014 10:31 PM, Danny Sokolsky wrote:
I don't see anything wrong with creating directories manually,
there is even an api for it:
http://docs.marklogic.com/xdmp:directory-create
But it seems like it might be a pretty big burden on the
application to do that, that is the only reason I was suggesting
making that automatic and seeing how cheap or expensive that is
for your app (and if you are creating the dirs anyway, how
different would that actually be than having MarkLogic create them
for you).
One other thing to note: newer versions of MarkLogic have a lot of
performance improvements around large updates and deletes, so if
you are on an older version of MarkLogic, upgrades can be good.
-Danny
------------------------------------------------------------------------
*From:*[email protected]
<mailto:[email protected]>
[[email protected]
<mailto:[email protected]>] on behalf of
Michael Sokolov [[email protected] <mailto:[email protected]>]
*Sent:* Thursday, May 22, 2014 7:18 PM
*To:* MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] best practices for manual
directory creation
Thanks for the suggestion, Danny; it seems sensible. At this
stage I don't want to modify the rest of the system, which is
pretty mature and relies on the system-maintained last-modified
property. In fact we already maintain a separate modified
timestamp in the documents with different semantics (eg you can
copy a document without updating its timestamp), but this can't be
used for tracking changes to binary documents. So I think we are
stuck with the built-in maintain-last-modified.
I did briefly try having directory creation=automatic + maintain
directory last modified=false and maintain-last-modified=true, but
I thought it looked as if things were slowing down again during a
large document import. I didn't measure carefully or continue
this experiment for long though because I think I have a solution
to the manual directory creation that is effective. All our
document updates go through a single java API, so I can track
updated uris there and manage directory insertion in batches as a
separate process. And it seems that the trick of setting the
<directory/> property tickles the modified-time. I suppose if
that became unsupported it could cause problems, but I think I can
live with that risk.
Coming back around to my initial question though -- it seems like
the consensus here is that best practice is *not* to create
directories manually?
-Mike
On 5/22/2014 12:05 PM, Danny Sokolsky wrote:
I think if you want to maintain these yourself, you should not
use the system maintained properties; instead, make up some of
your own that do the equivalent things.
That being said, have you tried leaving directory creation at
automatic, but turning off maintain last modified and maintain
directory last modified? Depending upon how deep your
directory hierarchy is, this might not cause too much
overhead. I would recommend trying that, and then just add a
dateTime property (or element in the document if you prefer,
allowing you to not have to create a property fragment) to
track whatever you want about the last modified (based on your
app requirements). I think that might work well, especially
if your hierarchy does not not have millions of directories.
See how it works and let us know.
-Danny
*From:*[email protected]
<mailto:[email protected]>
[mailto:[email protected]] *On Behalf Of
*Keith L. Breinholt
*Sent:* Thursday, May 22, 2014 8:23 AM
*To:* MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] best practices for
manual directory creation
<prop:last-modified> is not a property that you can manually
set. I believe that is a security issue.
*From:*[email protected]
<mailto:[email protected]>
[mailto:[email protected]] *On Behalf Of
*Mike Sokolov
*Sent:* Thursday, May 22, 2014 8:26 AM
*To:* MarkLogic General ML
*Subject:* Re: [MarkLogic Dev General] best practices for
manual directory creation
I'm getting good results updating the directory timestamps using:
xdmp:document-set-properties ($dir-uri, <prop:directory/>)
and this seems to limit the number of prop:directory
properties to 2 too
-Mike
On 05/22/2014 10:03 AM, Mike Sokolov wrote:
I'm working with a system that requires directories and
directory-modified timestamps (for a webDAV-like browsing
feature), but have found that automatic directory creation
introduces unacceptable lock contention during bulk
updates, so I am looking into managing the directory
creation and timestamp updates manually.
I have one question, and one strange observation - maybe a
bug. I'm working with 7.0-2.3.
First the question: how should I update the
prop:last-modified property?
Updating it explicitly raises an error:
XDMP-ARG: xdmp:document-set-property("/books/",
<prop:last-modified
xmlns:prop="http://marklogic.com/xdmp/property"
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/property&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0afdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>>2014-05-22T15:53:46.724003+02:00</prop:last-modified>)
-- Invalid argument
even though I have "maintain directory last modified" set
to false (and directory creation = manual). I do have
maintain last modified set to true, so I expect that is
happening automatically on directory creation - OK, but in
that instance how would I update the directory modified
time when inserting or deleting documents in the directory?
I tried adding a dummy property using xdmp:set-property,
and that does seem to update the timestamp, but I don't
really want to do that if I don't have to, of course.
Perhaps I could delete and then recreate the directory
properties document, but that doesn't seem great either.
Any other ideas?
Now the weird observation. It seems that every time I
modify the directory properties document, it gets another
<prop:directory /> property node! Currently I have:
<prop:properties
xmlns:prop="http://marklogic.com/xdmp/property"
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/property&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0afdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:last-modified>2014-05-22T15:47:37+02:00</prop:last-modified>
</prop:properties>
I thought that properties documents maintained a map with
unique keys?
-Mike
_______________________________________________
General mailing list
[email protected]
<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
<https://urldefense.proofpoint.com/v1/url?u=http://developer.marklogic.com/mailman/listinfo/general&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=c9cb9dbd161260f93e52fe3901e1bb716460a6fcc74f86cb436db69aa2cd554c>
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and
destroy all copies of the original message.
_______________________________________________
General mailing list
[email protected]
<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected] <mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general