Hi Mike,

 

The automatic dir creation will cause MarkLogic to have to check for dir
existance for each doc, for every parent directory of that doc. That
certainly slows down your system. Running a separate dir creation process
before the ingest, with just a dir creation of each dir yet missing, will
certainly speed up that bit. You can take an intersect of
cts:uri-match("*/") with the ones you need to find which need to be created.
Sort the dir uris you need to create, and you can run straight through them
from top to bottom.

 

I'm afraid though that having MarkLogic maintain last-modified on dirs will
still cause retention. That will also slow down your ingest. But maybe that
overhead is much smaller. It will probably help if you can make your ingest
batch up per directory..

 

Kind regards,

Geert

 

Van: [email protected]
[mailto:[email protected]] Namens Michael Sokolov
Verzonden: vrijdag 23 mei 2014 14:04
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] best practices for manual directory
creation

 

I saw an enormous performance improvement by turning off automatic directory
creation in 7.0-2.3.  I think the problem I was seeing is the one that Mike
Blakeley documented here:
http://blakeley.com/blogofile/2012/03/19/directory-assistance/.  I'm able to
work around that problem by using a fast-to-update lock-free data structure
on the client side to track all the new directories and create them in
batches using xdmp:directory-create.  I also have a solution for managing
the last-updated time for the directories, although it's a little cheesy.
It's possible there is a way to accomplish this using the various automatic
solutions, but based on my experience, I don't think it will perform as
well.

I agree that it's a lot of work to do this, and undoubtedly there are holes
in our system where we perform document updates and inserts in XQuery code
that we won't be managing, so it's not something I would recommend as a
general solution to folks using MarkLogic in a "normal" way.

-Mike


On 5/22/2014 10:31 PM, Danny Sokolsky wrote:

I don't see anything wrong with creating directories manually, there is even
an api for it: 

 

http://docs.marklogic.com/xdmp:directory-create

 

But it seems like it might be a pretty big burden on the application to do
that, that is the only reason I was suggesting making that automatic and
seeing how cheap or expensive that is for your app (and if you are creating
the dirs anyway, how different would that actually be than having MarkLogic
create them for you).

 

One other thing to note: newer versions of MarkLogic have a lot of
performance improvements around large updates and deletes, so if you are on
an older version of MarkLogic, upgrades can be good.

 

-Danny

 

  _____  

From: [email protected]
<mailto:[email protected]>
[[email protected]
<mailto:[email protected]> ] on behalf of Michael
Sokolov [[email protected] <mailto:[email protected]> ]
Sent: Thursday, May 22, 2014 7:18 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] best practices for manual directory
creation

Thanks for the suggestion, Danny; it seems sensible.  At this stage I don't
want to modify the rest of the system, which is pretty mature and relies on
the system-maintained last-modified property.  In fact we already maintain a
separate modified timestamp in the documents with different semantics (eg
you can copy a document without updating its timestamp), but this can't be
used for tracking changes to binary documents.  So I think we are stuck with
the built-in maintain-last-modified.

I did briefly try having directory creation=automatic + maintain directory
last modified=false and maintain-last-modified=true, but I thought it looked
as if things were slowing down again during a large document import.  I
didn't measure carefully or continue this experiment for long though because
I think I have a solution to the manual directory creation that is
effective.  All our document updates go through a single java API, so I can
track updated uris there and manage directory insertion in batches as a
separate process.  And it seems that the trick of setting the <directory/>
property tickles the modified-time.  I suppose if that became unsupported it
could cause problems, but I think I can live with that risk.

Coming back around to my initial question though -- it seems like the
consensus here is that best practice is *not* to create directories
manually?

-Mike

On 5/22/2014 12:05 PM, Danny Sokolsky wrote:

I think if you want to maintain these yourself, you should not use the
system maintained properties; instead, make up some of your own that do the
equivalent things.

 

That being said, have you tried leaving directory creation at automatic, but
turning off maintain last modified and maintain directory last modified?
Depending upon how deep your directory hierarchy is, this might not cause
too much overhead.  I would recommend trying that, and then just add a
dateTime property (or element in the document if you prefer, allowing you to
not have to create a property fragment) to track whatever you want about the
last modified (based on your app requirements).  I think that might work
well, especially if your hierarchy does not not have millions of
directories.  See how it works and let us know.

 

-Danny 

 

From: [email protected]
<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Keith L.
Breinholt
Sent: Thursday, May 22, 2014 8:23 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] best practices for manual directory
creation

 

<prop:last-modified> is not a property that you can manually set.  I believe
that is a security issue.

 

From: [email protected]
<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Mike Sokolov
Sent: Thursday, May 22, 2014 8:26 AM
To: MarkLogic General ML
Subject: Re: [MarkLogic Dev General] best practices for manual directory
creation

 

I'm getting good results updating the directory timestamps using:

xdmp:document-set-properties ($dir-uri, <prop:directory/>)

and this seems to limit the number of prop:directory properties to 2 too

-Mike

On 05/22/2014 10:03 AM, Mike Sokolov wrote:

I'm working with a system that requires directories and directory-modified
timestamps (for a webDAV-like browsing feature), but have found that
automatic directory creation introduces unacceptable lock contention during
bulk updates, so I am looking into managing the directory creation and
timestamp updates manually.

I have one question, and one strange observation - maybe a bug.  I'm working
with 7.0-2.3.

First the question: how should I update the prop:last-modified property?

Updating it explicitly raises an error:


XDMP-ARG: xdmp:document-set-property("/books/", <prop:last-modified
xmlns:prop=
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/proper
ty&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2q
yDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0a
fdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>
"http://marklogic.com/xdmp/property";>2014-05-22T15:53:46.724003+02:00</prop:
last-modified>) -- Invalid argument


even though I have "maintain directory last modified" set to false (and
directory creation = manual).  I do have maintain last modified set to true,
so I expect that is happening automatically on directory creation - OK, but
in that instance how would I update the directory modified time when
inserting or deleting documents in the directory?

I tried adding a dummy property using xdmp:set-property, and that does seem
to update the timestamp, but I don't really want to do that if I don't have
to, of course.  Perhaps I could delete and then recreate the directory
properties document, but that doesn't seem great either. Any other ideas?  

Now the weird observation.  It seems that every time I modify the directory
properties document, it gets another <prop:directory /> property node!
Currently I have:

<prop:properties xmlns:prop=
<https://urldefense.proofpoint.com/v1/url?u=http://marklogic.com/xdmp/proper
ty&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5gsGqutGbMyYaH6V5O1y2q
yDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOoRO9yE%3D%0A&s=b129b0a
fdc261f273e05e156d512fac53273fae49be30e4e6423deb66508ad09>
"http://marklogic.com/xdmp/property";>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:directory/>
<prop:last-modified>2014-05-22T15:47:37+02:00</prop:last-modified>
</prop:properties>

I thought that properties documents maintained a map with unique keys?

-Mike

 

_______________________________________________
General mailing list
[email protected] <mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general
<https://urldefense.proofpoint.com/v1/url?u=http://developer.marklogic.com/m
ailman/listinfo/general&k=wlPCrglRP6kzT4RbABWMaw%3D%3D%0A&r=2FOxwjXkcRFP9Zb5
gsGqutGbMyYaH6V5O1y2qyDOE%2Bw%3D%0A&m=tMQwNzleMcPFHrHVywsz7LShGCB7BV0fr4nwOo
RO9yE%3D%0A&s=c9cb9dbd161260f93e52fe3901e1bb716460a6fcc74f86cb436db69aa2cd55
4c> 

 



NOTICE: This email message is for the sole use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized
review, use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and destroy all
copies of the original message.

 





_______________________________________________
General mailing list
[email protected] <mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general

 






_______________________________________________
General mailing list
[email protected] <mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general

 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to