Re: [MarkLogic Dev General] Viewing Documents with WebDAV

Walter Underwood Fri, 17 Jun 2011 10:47:45 -0700

If your WebDAV client is Windows, make sure you are up-to-date. Microsoft has 
two different implementations of WebDAV and each has a somewhat rocky history 
of bugs.


Wikipedia tries to make sense of it:

http://en.wikipedia.org/wiki/WebDAV#Microsoft_Windows

Here is a good summary of the known bugs in different versions of Microsoft's 
two WebDAV implementations. The "mini-redirector" is the most recent and seems 
generally preferred:

http://www.greenbytes.de/tech/webdav/webdav-redirector-list.html
http://www.greenbytes.de/tech/webdav/webfolder-client-list.html

<http://www.greenbytes.de/tech/webdav/webdav-redirector-list.html>wunder
==
Walter Underwood
Lead Engineer, MarkLogic Corporation
[email protected]<mailto:[email protected]>
http://www.marklogic.com/

On Jun 17, 2011, at 10:33 AM, Danny Sokolsky wrote:

Hi Tim,

There is nothing inherently slow about WebDAV per se, but as I see it, there 
are 2 issues that people tend to run up against:

1)      Scale:  WebDAV requires last-modified to be enabled, which in turn 
creates properties fragments on each document in the database.  This tends to 
be fine for smaller data sets (say a few million documents), but is less fine 
for larger data sets (100s of million or billions of documents).
2)      Many WebDAV clients have problems: Some WebDAV clients do strange 
things.  This tends to manifest itself in weird behavior when you do things 
like rename a directory.  For example, on Windows 7 WebDAV, when you create a 
document via WebDAV, it first creates an empty document, then updates it with 
the contents.

Now many of the problems you see in WebDAV clients you would also see if you 
just used a filesystem browser.  Try opening a filesystem browser on a 
directory that contains 1 million documents (actually, I would not try it if I 
were you).  It is hard on both the client and on the server.

If what you want is directory browsing, I would recommend writing something in 
XQuery and use cts:uris.

So it really depends on what you are doing.

I’m not sure that really answers your question, but maybe it will begin to chip 
away at it.

-Danny

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Tim Meagher
Sent: Friday, June 17, 2011 9:34 AM
To: 'General MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Viewing Documents with WebDAV

Hi Folks,

I haven’t had any response to this yet – did I hit a nerve? :)

My experience with WebDAV and MarkLogic has been interesting.  There are things 
I know cause problems, such as trying to move or rename a directory URI.  It 
makes me wonder if there are some WebDAV commands that should be disallowed, 
unless it depends on the client.  In any case, it would be very beneficial to 
get some answers about my original question.

~Tim M.

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Tim Meagher
Sent: Sunday, June 12, 2011 4:57 PM
To: 'General MarkLogic Developer Discussion'
Subject: [MarkLogic Dev General] Viewing Documents with WebDAV

Hi Folks,

The MarkLogic documentation clearly states that:

The main purpose of a WebDAV server is to make it easy for people to store, 
retrieve, and modify documents in a database. The documents can be any type, 
whether they are text documents such as.txt files or source code, binary 
documents such as image files or Microsoft Word files, or XML documents. 
Because the documents are stored in a database, you can create applications 
that use the content in those documents for whatever purpose you need. You can 
also use the database backup and restore features to easily back up the content 
in the database.

WebDAV is pretty useless when it comes to browsing directory URIs than contain 
too many documents and/or subdirectories, but I have been operating on the 
assumption that by organizing a large set of XML documents in a hierarchical 
directory URI structure that I can limit the number of documents and 
subdirectories that are accessible via the Data Source Explorer in oXygen’s 
WebDAV client (and other WebDAV clients as well).

It has been extremely valuable for me to use the oXygen WebDAV Data Source 
Explorer to quickly drill down, locate documents, compare the input and output 
of transforms, to debug updated transforms, and when necessary to manually 
correct and save an errant  XML document in the input stream of a CPF pipeline. 
 This functionality is not simply available in CQ.  However, I was recently 
informed that even if directory structures are organized hierarchically, that 
WebDAV clients still cause the server to incur significant performance hits 
when opening a directory in the oXygen or other WebDAV treeview to explore its 
contents.  The risk may be more pronounced on a production system than on a 
development system, but full content sets for evaluation may only be available 
in a production environment.  I haven’t seen any documentation that discusses 
these details and I’m not familiar with the WebDAV API’s for browsing and 
reading directory URIs in MarkLogic, so I would like to go a little deeper and 
try to determine:

1.       How much of a performance hit is incurred and at what point, i.e., 
when there are 1,000 subdirectories and/or document URIs within a given 
directory URI?  Is there a given number at which the performance hit becomes 
negligible so that using a hierarchical directory URI structure is feasible?  
Is there a way to measure that performance for any given WebDAV client?

2.       Can the WebDAV client be tuned to explicitly prevent any significant 
hits to the server, i.e., by limiting threads, timeouts, etc?

3.       The 4.1 documentation refers to  tested WebDAV clients.  I’m surprised 
that list doesn’t include oXygen or some other freeware clients.  I’m assuming 
that by saying that the listed WebDAV clients were tested that they also passed 
some form of acceptance testing.  I’d like to suggest that the list be updated 
to reflect oXygen, EnGinSite DataFreeway, and BitKinex.

4.       Are there protocols in WebDAV that allow for limited directory viewing 
– that is to only request the first N subdirectories and/or documents within a 
given directory URI so as to explicitly limit the load on the server when 
trying to get a directory listing?

My options for replacing the use of oXygen and to avoid performance penalties 
associated with WebDAV are to:

1.       Build my own WebDAV client (if I can limit the directory listings), or

2.       Build an xquery web app that uses optimized queries such as cts:uris 
to drill down into directory URIs with an expandable TreeView to locate 
documents (and as a means of using the oXygen WebDAV-based editor to create the 
URL of the document I want to open and to paste it into the oXygen OpenURL 
feature).

3.       Find existing code that can hopefully be ported  to perform either of 
the above.

Thank you all for any help with this!

Tim Meagher

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Viewing Documents with WebDAV

Reply via email to