I saw an enormous performance improvement by turning off automatic directory creation in 7.0-2.3. I think the problem I was seeing is the one that Mike Blakeley documented here: http://blakeley.com/blogofile/2012/03/19/directory-assistance/. I'm able to work around that problem by using a fast-to-update lock-free data structure on the client side to track all the new directories and create them in batches using xdmp:directory-create. I also have a solution for managing the last-updated time for the directories, although it's a little cheesy. It's possible there is a way to accomplish this using the various automatic solutions, but based on my experience, I don't think it will perform as well.

I agree that it's a lot of work to do this, and undoubtedly there are holes in our system where we perform document updates and inserts in XQuery code that we won't be managing, so it's not something I would recommend as a general solution to folks using MarkLogic in a "normal" way.

-Mike


On 5/22/2014 10:31 PM, Danny Sokolsky wrote:
I don't see anything wrong with creating directories manually, there is even an api for it:

http://docs.marklogic.com/xdmp:directory-create

But it seems like it might be a pretty big burden on the application to do that, that is the only reason I was suggesting making that automatic and seeing how cheap or expensive that is for your app (and if you are creating the dirs anyway, how different would that actually be than having MarkLogic create them for you).

One other thing to note: newer versions of MarkLogic have a lot of performance improvements around large updates and deletes, so if you are on an older version of MarkLogic, upgrades can be good.

-Danny

------------------------------------------------------------------------
*From:* [email protected] [[email protected]] on behalf of Michael Sokolov [[email protected]]
*Sent:* Thursday, May 22, 2014 7:18 PM
*To:* MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] best practices for manual directory creation

Thanks for the suggestion, Danny; it seems sensible. At this stage I don't want to modify the rest of the system, which is pretty mature and relies on the system-maintained last-modified property. In fact we already maintain a separate modified timestamp in the documents with different semantics (eg you can copy a document without updating its timestamp), but this can't be used for tracking changes to binary documents. So I think we are stuck with the built-in maintain-last-modified.

I did briefly try having directory creation=automatic + maintain directory last modified=false and maintain-last-modified=true, but I thought it looked as if things were slowing down again during a large document import. I didn't measure carefully or continue this experiment for long though because I think I have a solution to the manual directory creation that is effective. All our document updates go through a single java API, so I can track updated uris there and manage directory insertion in batches as a separate process. And it seems that the trick of setting the <directory/> property tickles the modified-time. I suppose if that became unsupported it could cause problems, but I think I can live with that risk.

Coming back around to my initial question though -- it seems like the consensus here is that best practice is *not* to create directories manually?

-Mike

On 5/22/2014 12:05 PM, Danny Sokolsky wrote:

I think if you want to maintain these yourself, you should not use the system maintained properties; instead, make up some of your own that do the equivalent things.

That being said, have you tried leaving directory creation at automatic, but turning off maintain last modified and maintain directory last modified? Depending upon how deep your directory hierarchy is, this might not cause too much overhead. I would recommend trying that, and then just add a dateTime property (or element in the document if you prefer, allowing you to not have to create a property fragment) to track whatever you want about the last modified (based on your app requirements). I think that might work well, especially if your hierarchy does not not have millions of directories. See how it works and let us know.

-Danny

*From:*[email protected] [mailto:[email protected]] *On Behalf Of *Keith L. Breinholt
*Sent:* Thursday, May 22, 2014 8:23 AM
*To:* MarkLogic Developer Discussion
*Subject:* Re: [MarkLogic Dev General] best practices for manual directory creation

<prop:last-modified> is not a property that you can manually set. I believe that is a security issue.

*From:*[email protected] [mailto:[email protected]] *On Behalf Of *Mike Sokolov
*Sent:* Thursday, May 22, 2014 8:26 AM
*To:* MarkLogic General ML
*Subject:* Re: [MarkLogic Dev General] best practices for manual directory creation

I'm getting good results updating the directory timestamps using:

xdmp:document-set-properties ($dir-uri, <prop:directory/>)

and this seems to limit the number of prop:directory properties to 2 too

-Mike

On 05/22/2014 10:03 AM, Mike Sokolov wrote:

    I'm working with a system that requires directories and
    directory-modified timestamps (for a webDAV-like browsing
    feature), but have found that automatic directory creation
    introduces unacceptable lock contention during bulk updates, so I
    am looking into managing the directory creation and timestamp
    updates manually.

I have one question, and one strange observation - maybe a bug. I'm working with 7.0-2.3.

    First the question: how should I update the prop:last-modified
    property?

    Updating it explicitly raises an error:


          XDMP-ARG: xdmp:document-set-property("/books/",
          <prop:last-modified
          xmlns:prop="http://marklogic.com/xdmp/property";
          
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/property&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0afdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>>2014-05-22T15:53:46.724003+02:00</prop:last-modified>)
          -- Invalid argument

    even though I have "maintain directory last modified" set to
    false (and directory creation = manual).  I do have maintain last
    modified set to true, so I expect that is happening automatically
    on directory creation - OK, but in that instance how would I
    update the directory modified time when inserting or deleting
    documents in the directory?

    I tried adding a dummy property using xdmp:set-property, and that
    does seem to update the timestamp, but I don't really want to do
    that if I don't have to, of course.  Perhaps I could delete and
    then recreate the directory properties document, but that doesn't
    seem great either. Any other ideas?

    Now the weird observation.  It seems that every time I modify the
    directory properties document, it gets another <prop:directory />
    property node!  Currently I have:

    <prop:properties xmlns:prop="http://marklogic.com/xdmp/property";
    
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/property&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0afdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>>
    <prop:directory/>
    <prop:directory/>
    <prop:directory/>
    <prop:directory/>
    <prop:directory/>
    <prop:directory/>
    <prop:directory/>
    <prop:directory/>
    <prop:directory/>
    <prop:directory/>
    <prop:last-modified>2014-05-22T15:47:37+02:00</prop:last-modified>
    </prop:properties>

    I thought that properties documents maintained a map with unique
    keys?

    -Mike



    _______________________________________________

    General mailing list

    [email protected]  <mailto:[email protected]>

    http://developer.marklogic.com/mailman/listinfo/general  
<https://urldefense.proofpoint.com/v1/url?u=http://developer.marklogic.com/mailman/listinfo/general&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=c9cb9dbd161260f93e52fe3901e1bb716460a6fcc74f86cb436db69aa2cd554c>



NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.



_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general



_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to