Thanks Geert. You described more or less what I had done. With directory-creation=manual and maintain-directory-last-modified=false (but maintain-last-modified=true), things seemed to run quite quickly, without the contention I observed before. The tricky part is updating the directory modification times manually. The only reliable solution I have for that is to set a dummy property on the directory property fragment.

-Mike

On 6/4/2014 4:29 AM, Geert Josten wrote:

Hi Mike,

The automatic dir creation will cause MarkLogic to have to check for dir existance for each doc, for every parent directory of that doc. That certainly slows down your system. Running a separate dir creation process before the ingest, with just a dir creation of each dir yet missing, will certainly speed up that bit. You can take an intersect of cts:uri-match("*/") with the ones you need to find which need to be created. Sort the dir uris you need to create, and you can run straight through them from top to bottom.

I'm afraid though that having MarkLogic maintain last-modified on dirs will still cause retention. That will also slow down your ingest. But maybe that overhead is much smaller. It will probably help if you can make your ingest batch up per directory..

Kind regards,

Geert

*Van:*[email protected] [mailto:[email protected]] *Namens *Michael Sokolov
*Verzonden:* vrijdag 23 mei 2014 14:04
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] best practices for manual directory creation

I saw an enormous performance improvement by turning off automatic directory creation in 7.0-2.3. I think the problem I was seeing is the one that Mike Blakeley documented here: http://blakeley.com/blogofile/2012/03/19/directory-assistance/. I'm able to work around that problem by using a fast-to-update lock-free data structure on the client side to track all the new directories and create them in batches using xdmp:directory-create. I also have a solution for managing the last-updated time for the directories, although it's a little cheesy. It's possible there is a way to accomplish this using the various automatic solutions, but based on my experience, I don't think it will perform as well.

I agree that it's a lot of work to do this, and undoubtedly there are holes in our system where we perform document updates and inserts in XQuery code that we won't be managing, so it's not something I would recommend as a general solution to folks using MarkLogic in a "normal" way.

-Mike


On 5/22/2014 10:31 PM, Danny Sokolsky wrote:

    I don't see anything wrong with creating directories manually,
    there is even an api for it:

    http://docs.marklogic.com/xdmp:directory-create

    But it seems like it might be a pretty big burden on the
    application to do that, that is the only reason I was suggesting
    making that automatic and seeing how cheap or expensive that is
    for your app (and if you are creating the dirs anyway, how
    different would that actually be than having MarkLogic create them
    for you).

    One other thing to note: newer versions of MarkLogic have a lot of
    performance improvements around large updates and deletes, so if
    you are on an older version of MarkLogic, upgrades can be good.

    -Danny

    ------------------------------------------------------------------------

    *From:*[email protected]
    <mailto:[email protected]>
    [[email protected]
    <mailto:[email protected]>] on behalf of
    Michael Sokolov [[email protected] <mailto:[email protected]>]
    *Sent:* Thursday, May 22, 2014 7:18 PM
    *To:* MarkLogic Developer Discussion
    *Subject:* Re: [MarkLogic Dev General] best practices for manual
    directory creation

    Thanks for the suggestion, Danny; it seems sensible.  At this
    stage I don't want to modify the rest of the system, which is
    pretty mature and relies on the system-maintained last-modified
    property.  In fact we already maintain a separate modified
    timestamp in the documents with different semantics (eg you can
    copy a document without updating its timestamp), but this can't be
    used for tracking changes to binary documents.  So I think we are
    stuck with the built-in maintain-last-modified.

    I did briefly try having directory creation=automatic + maintain
    directory last modified=false and maintain-last-modified=true, but
    I thought it looked as if things were slowing down again during a
    large document import.  I didn't measure carefully or continue
    this experiment for long though because I think I have a solution
    to the manual directory creation that is effective.  All our
    document updates go through a single java API, so I can track
    updated uris there and manage directory insertion in batches as a
    separate process.  And it seems that the trick of setting the
    <directory/> property tickles the modified-time.  I suppose if
    that became unsupported it could cause problems, but I think I can
    live with that risk.

    Coming back around to my initial question though -- it seems like
    the consensus here is that best practice is *not* to create
    directories manually?

    -Mike

    On 5/22/2014 12:05 PM, Danny Sokolsky wrote:

        I think if you want to maintain these yourself, you should not
        use the system maintained properties; instead, make up some of
        your own that do the equivalent things.

        That being said, have you tried leaving directory creation at
        automatic, but turning off maintain last modified and maintain
        directory last modified?  Depending upon how deep your
        directory hierarchy is, this might not cause too much
        overhead.  I would recommend trying that, and then just add a
        dateTime property (or element in the document if you prefer,
        allowing you to not have to create a property fragment) to
        track whatever you want about the last modified (based on your
        app requirements).  I think that might work well, especially
if your hierarchy does not not have millions of directories. See how it works and let us know.

        -Danny

        *From:*[email protected]
        <mailto:[email protected]>
        [mailto:[email protected]] *On Behalf Of
        *Keith L. Breinholt
        *Sent:* Thursday, May 22, 2014 8:23 AM
        *To:* MarkLogic Developer Discussion
        *Subject:* Re: [MarkLogic Dev General] best practices for
        manual directory creation

        <prop:last-modified> is not a property that you can manually
        set. I believe that is a security issue.

        *From:*[email protected]
        <mailto:[email protected]>
        [mailto:[email protected]] *On Behalf Of
        *Mike Sokolov
        *Sent:* Thursday, May 22, 2014 8:26 AM
        *To:* MarkLogic General ML
        *Subject:* Re: [MarkLogic Dev General] best practices for
        manual directory creation

        I'm getting good results updating the directory timestamps using:

        xdmp:document-set-properties ($dir-uri, <prop:directory/>)

        and this seems to limit the number of prop:directory
        properties to 2 too

        -Mike

        On 05/22/2014 10:03 AM, Mike Sokolov wrote:

            I'm working with a system that requires directories and
            directory-modified timestamps (for a webDAV-like browsing
            feature), but have found that automatic directory creation
            introduces unacceptable lock contention during bulk
            updates, so I am looking into managing the directory
            creation and timestamp updates manually.

            I have one question, and one strange observation - maybe a
            bug.  I'm working with 7.0-2.3.

            First the question: how should I update the
            prop:last-modified property?

            Updating it explicitly raises an error:


                  XDMP-ARG: xdmp:document-set-property("/books/",
                  <prop:last-modified
                  xmlns:prop="http://marklogic.com/xdmp/property";
                  
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/property&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0afdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>>2014-05-22T15:53:46.724003+02:00</prop:last-modified>)
                  -- Invalid argument

            even though I have "maintain directory last modified" set
            to false (and directory creation = manual).  I do have
            maintain last modified set to true, so I expect that is
            happening automatically on directory creation - OK, but in
            that instance how would I update the directory modified
            time when inserting or deleting documents in the directory?

            I tried adding a dummy property using xdmp:set-property,
            and that does seem to update the timestamp, but I don't
            really want to do that if I don't have to, of course.
            Perhaps I could delete and then recreate the directory
            properties document, but that doesn't seem great either.
            Any other ideas?

            Now the weird observation.  It seems that every time I
            modify the directory properties document, it gets another
            <prop:directory /> property node!  Currently I have:

            <prop:properties
            xmlns:prop="http://marklogic.com/xdmp/property";
            
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/property&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0afdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>>
            <prop:directory/>
            <prop:directory/>
            <prop:directory/>
            <prop:directory/>
            <prop:directory/>
            <prop:directory/>
            <prop:directory/>
            <prop:directory/>
            <prop:directory/>
            <prop:directory/>
            <prop:last-modified>2014-05-22T15:47:37+02:00</prop:last-modified>
            </prop:properties>

            I thought that properties documents maintained a map with
            unique keys?

            -Mike

            _______________________________________________

            General mailing list

            [email protected]  
<mailto:[email protected]>

            http://developer.marklogic.com/mailman/listinfo/general  
<https://urldefense.proofpoint.com/v1/url?u=http://developer.marklogic.com/mailman/listinfo/general&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=c9cb9dbd161260f93e52fe3901e1bb716460a6fcc74f86cb436db69aa2cd554c>



        NOTICE: This email message is for the sole use of the intended
        recipient(s) and may contain confidential and privileged
        information. Any unauthorized review, use, disclosure or
        distribution is prohibited. If you are not the intended
        recipient, please contact the sender by reply email and
        destroy all copies of the original message.



        _______________________________________________

        General mailing list

        [email protected]  
<mailto:[email protected]>

        http://developer.marklogic.com/mailman/listinfo/general




    _______________________________________________

    General mailing list

    [email protected]  <mailto:[email protected]>

    http://developer.marklogic.com/mailman/listinfo/general



_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to