I saw an enormous performance improvement by turning off automatic
directory creation in 7.0-2.3. I think the problem I was seeing is the
one that Mike Blakeley documented here:
http://blakeley.com/blogofile/2012/03/19/directory-assistance/. I'm able
to work around that problem by using a fast-to-update lock-free data
structure on the client side to track all the new directories and create
them in batches using xdmp:directory-create. I also have a solution for
managing the last-updated time for the directories, although it's a
little cheesy. It's possible there is a way to accomplish this using
the various automatic solutions, but based on my experience, I don't
think it will perform as well.
I agree that it's a lot of work to do this, and undoubtedly there are
holes in our system where we perform document updates and inserts in
XQuery code that we won't be managing, so it's not something I would
recommend as a general solution to folks using MarkLogic in a "normal" way.
-Mike
On 5/22/2014 10:31 PM, Danny Sokolsky wrote:
I don't see anything wrong with creating directories manually, there
is even an api for it:
http://docs.marklogic.com/xdmp:directory-create
But it seems like it might be a pretty big burden on the application
to do that, that is the only reason I was suggesting making that
automatic and seeing how cheap or expensive that is for your app (and
if you are creating the dirs anyway, how different would that actually
be than having MarkLogic create them for you).
One other thing to note: newer versions of MarkLogic have a lot of
performance improvements around large updates and deletes, so if you
are on an older version of MarkLogic, upgrades can be good.
-Danny
------------------------------------------------------------------------
*From:* [email protected]
[[email protected]] on behalf of Michael Sokolov
[[email protected]]
*Sent:* Thursday, May 22, 2014 7:18 PM
*To:* MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] best practices for manual
directory creation
Thanks for the suggestion, Danny; it seems sensible. At this stage I
don't want to modify the rest of the system, which is pretty mature
and relies on the system-maintained last-modified property. In fact
we already maintain a separate modified timestamp in the documents
with different semantics (eg you can copy a document without updating
its timestamp), but this can't be used for tracking changes to binary
documents. So I think we are stuck with the built-in
maintain-last-modified.
I did briefly try having directory creation=automatic + maintain
directory last modified=false and maintain-last-modified=true, but I
thought it looked as if things were slowing down again during a large
document import. I didn't measure carefully or continue this
experiment for long though because I think I have a solution to the
manual directory creation that is effective. All our document updates
go through a single java API, so I can track updated uris there and
manage directory insertion in batches as a separate process. And it
seems that the trick of setting the <directory/> property tickles the
modified-time. I suppose if that became unsupported it could cause
problems, but I think I can live with that risk.
Coming back around to my initial question though -- it seems like the
consensus here is that best practice is *not* to create directories
manually?
-Mike
On 5/22/2014 12:05 PM, Danny Sokolsky wrote:
I think if you want to maintain these yourself, you should not use
the system maintained properties; instead, make up some of your own
that do the equivalent things.
That being said, have you tried leaving directory creation at
automatic, but turning off maintain last modified and maintain
directory last modified? Depending upon how deep your directory
hierarchy is, this might not cause too much overhead. I would
recommend trying that, and then just add a dateTime property (or
element in the document if you prefer, allowing you to not have to
create a property fragment) to track whatever you want about the last
modified (based on your app requirements). I think that might work
well, especially if your hierarchy does not not have millions of
directories. See how it works and let us know.
-Danny
*From:*[email protected]
[mailto:[email protected]] *On Behalf Of *Keith
L. Breinholt
*Sent:* Thursday, May 22, 2014 8:23 AM
*To:* MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] best practices for manual
directory creation
<prop:last-modified> is not a property that you can manually set. I
believe that is a security issue.
*From:*[email protected]
[mailto:[email protected]] *On Behalf Of *Mike
Sokolov
*Sent:* Thursday, May 22, 2014 8:26 AM
*To:* MarkLogic General ML
*Subject:* Re: [MarkLogic Dev General] best practices for manual
directory creation
I'm getting good results updating the directory timestamps using:
xdmp:document-set-properties ($dir-uri, <prop:directory/>)
and this seems to limit the number of prop:directory properties to 2 too
-Mike
On 05/22/2014 10:03 AM, Mike Sokolov wrote:
I'm working with a system that requires directories and
directory-modified timestamps (for a webDAV-like browsing
feature), but have found that automatic directory creation
introduces unacceptable lock contention during bulk updates, so I
am looking into managing the directory creation and timestamp
updates manually.
I have one question, and one strange observation - maybe a bug.
I'm working with 7.0-2.3.
First the question: how should I update the prop:last-modified
property?
Updating it explicitly raises an error:
XDMP-ARG: xdmp:document-set-property("/books/",
<prop:last-modified
xmlns:prop="http://marklogic.com/xdmp/property"
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/property&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0afdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>>2014-05-22T15:53:46.724003+02:00</prop:last-modified>)
-- Invalid argument
even though I have "maintain directory last modified" set to
false (and directory creation = manual). I do have maintain last
modified set to true, so I expect that is happening automatically
on directory creation - OK, but in that instance how would I
update the directory modified time when inserting or deleting
documents in the directory?
I tried adding a dummy property using xdmp:set-property, and that
does seem to update the timestamp, but I don't really want to do
that if I don't have to, of course. Perhaps I could delete and
then recreate the directory properties document, but that doesn't
seem great either. Any other ideas?
Now the weird observation. It seems that every time I modify the
directory properties document, it gets another <prop:directory />
property node! Currently I have:
<prop:properties xmlns:prop="http://marklogic.com/xdmp/property"
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/property&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0afdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:last-modified>2014-05-22T15:47:37+02:00</prop:last-modified>
</prop:properties>
I thought that properties documents maintained a map with unique
keys?
-Mike
_______________________________________________
General mailing list
[email protected] <mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
<https://urldefense.proofpoint.com/v1/url?u=http://developer.marklogic.com/mailman/listinfo/general&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=c9cb9dbd161260f93e52fe3901e1bb716460a6fcc74f86cb436db69aa2cd554c>
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general