If we went with the "lightweight" approach and used primarily timestamps to
detect changes. What if the very 1st thing the process did was grab the
local time from the client and then query from the local time on the server
and work in some fudge factor when comparing times?

On Tue, Jun 15, 2010 at 2:13 PM, Mike Sokolov <[email protected]> wrote:

>  You might want to provide an alternative "lightweight" version that relies
> on the last-updated timestamp MarkLogic maintains (or can maintain) for
> you.  This does of course require clocks that are not too far off, but in
> our practice we have found this to be a workable, performant solution to
> synchronizing separate database instances.
>
> I believe also, that ML plans, or may have built already, replication
> support at the database level, so that might discourage you from spending
> *too* much time on this - not sure what the time frame for that feature is
> though - is it in the latest? Anyone from ML care to speak up?
>
> -Mike
>
>
> On 06/15/2010 03:49 PM, Lee, David wrote:
>
>  1) How would the MD5 get set initially and going forward?
>
> Any update operation would have to update the property appropriately.
>
> If the all updates were using this new "xrsync" command this would happen
> automatically.  If you updates it with another tool and dont adjust the MD5
> correctly then syncs may fail either by pushing unnecessarily or not pushing
> when they should.
>
>
>
> 2) If you know your documents are relatively small and the total number is
> reasonable, calculating on the fly might not be that bad.
>
> I'm still concerned that calculating "on the fly" is not feasable.  Since
> documents are not stored in ML intact then md5 and length values
>
> calculated on the server will differ from calculations on a local
> filesystem.
>
> However perhaps the use case your most interested in (and me) being .xquery
> module files maybe they happen to work ?
>
>
>
> 3) Feature for ML.
>
> I suspect this wont be an easy feature for ML to implement due to #2 ...
> there simply is no serialized representation on the server or else we'd have
> length properties or system calls already.
>
>
>
>
>
> *From:* [email protected] [
> mailto:[email protected]<[email protected]>]
> *On Behalf Of *Mike Brevoort
> *Sent:* Tuesday, June 15, 2010 3:14 PM
> *To:* General Mark Logic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] MarkLogic "rsync" command - RE:
> MacWebdav Client setting xqyfilesasbinary
>
>
>
> David,
>
>
>
> I'm obviously interested. :)
>
>
>
> If using a property for the MD5 hash you would need to make sure that that
> hash stay in sync with the document contents. How would the MD5 get set
> initially and going forward?
>
>
>
> It might be nice to support both options. If you know your documents are
> relatively small and the total number is reasonable, calculating on the fly
> might not be that bad.
>
>
>
> The main use case for me is to sync code resources to a module database. In
> most cases the sizes of these files are small.
>
>
>
> A great feature request for MarkLogic would be for the server to make
> available an MD5 hash property on the document for quick comparisons and
> such.
>
>
>
> Thanks!
>
> Mike
>
> On Tue, Jun 15, 2010 at 10:05 AM, Lee, David <[email protected]> wrote:
>
> I've been thinking of going ahead and prototypeing this.  That is a
> marklogic "rsync" type command.
>
> From my experimentation the way I think would work best is as described
> below (included email thread)
>
> That is to set a property on all files which includes the md5 and length
> (file length in bytes prior to uploading to ML).
>
> Then using client side logic compare the new list of files to whats on ML
> and generate a set of update/insert/delete commands.
>
> I've already done this for a special case and it worked well, so thinking
> of cleaning up the code and making it general purpose.
>
> Although my purposes are for updateing ML ... there's no reason the reverse
> couldnt also be done, to update with minimal operations a local filesystem.
>
>
>
> The questions I have are :
>
>
>
> 1) Would anyone be interested in this ?
>
>
>
> 2) How 'offensive' is storing a property on documents ?  Would this be a
> 'deal killer' ?  Should it be in a private namespace ?
>
>
>
> 3) How efficient is storing properties ? Does having to read,store,update
> properties negate any time savings from avoiding the load ?
>
>  That is, I suspect for some size documents is actually faster just to push
> them unconditionally rather then have to look at properties and calculate
> MD5 sums to decide if to push ...
>
>
>
> 4) I could avoid properties entirely by calculating the MD5 and length on
> the fly in ML ... however I believe both require serializing the document in
> memory in ML.   The xdmp:md5() takes a string, not a document.  And there is
> no actual size method, that also requires serializing the document.
>
> The only way I can think of is to use xdmp:quote(doc(...)) then calculate
> the length and md5 on the server.   My gut feeling is that doing this is a
> very heavy weight operation on large files and would be less efficient then
> just unconditionally pushing the document (except maybe on very very slow
> networks).
>
> Also I'm not sure (and I am highly suspicious its NOT true) that an MD5
> calculated on a file on local disk wont match xdmp:md5(
> xdmp:quote(doc(...))) for the same file due to serialization differences.
> Same with length . Thus making this strategy pointless.
>
>
>
>
>
>
>
>
>
> -David
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Lee, David
> *Sent:* Friday, June 11, 2010 10:00 AM
> *To:* General Mark Logic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] Mac Webdav Client setting
> xqyfilesasbinary
>
>
>
> I would LOVE help with this project.   (And yes I just checked in an update
> a half hour ago ... hate to point people at old code :)
>
> I've been thinking of exactly what your saying.  The only thing stopping me
> besides time ... is I haven't figured out how to
>
> make sure the clocks are in sync and what the failure cases are if they are
> not.
>
>
>
> What I've done in another project is to use an MD5 checksum.   There is a
> undocumented (its experimental) flag to put which adds a property with a MD5
> checksum.   xmlsh has a MD5 sum command (
> http://www.xmlsh.org/CommandXmd5sum).
>
> I generate a list of all documents with the MD5 sum,  compare against local
> disk then update only changed files, propagating deletes, inserts, and
> updates.   It worked great for one project ... but I have not generalized
> this code yet ...
>
>
>
> I'm reluctant to blindly add properties to 'other peoples files' so I
> havent made this into a general utility yet.
>
>
>
> Discussion  greatly welcome ! (and help too ... )
>
> -David
>
>
>
>
>
> ----------------------------------------
>
> David A. Lee
>
> Senior Principal Software Engineer
>
> Epocrates, Inc.
>
> [email protected]
>
> 812-482-5224
>
>
>
>
>
>
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Mike Brevoort
> *Sent:* Friday, June 11, 2010 9:43 AM
> *To:* General Mark Logic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] Mac Webdav Client setting xqy
> filesasbinary
>
>
>
> Thanks David, That looks really cool.
>
>
>
> I was just looking at the code (that I've seen you are actively working on-
> checkins the last several minutes :)  )and it seems like it wouldn't be too
> hard to create a a sync option for rsync like behavior (simpler obviously).
> If given a source (filesystem) and destination (marklogic DB directory) and
> depth (how far to recurse), we should be able to grab a list of all of the
> files on the server, their content-length and last updated dateTime. Then we
> could compare on the source filesystem for new/deleted and by size and date
> updated to decide which files to get and put.
>
>
>
> What do you think of that approach? I or someone on my team might be
> willing to take a crack at this.
>
>
>
> Also, what's required for others to run xmlsh on windows?
>
>
>
> Thanks!
>
> Mike
>
> On Fri, Jun 11, 2010 at 6:19 AM, Lee, David <[email protected]> wrote:
>
> You might want to consider the MarkLogic extension to xmlsh
>
> http://www.xmlsh.org/ModuleMarkLogic
>
>
>
> This includes a "put" command which works similary to rsync (not quite as
> good as it doesnt handle minimal updates yet ... TBD)
>
>
>
> http://www.xmlsh.org/MarkLogicPut
>
>
>
>
>
> But I use it for scripting updates to modules.  It uses XDBC (XCC) not
> WebDav.  You can set the file type explicitly (-t for text).
>
> Or it uses the server default logic.
>
>
>
> Its not as powerful as recordloader but its easier to use.
>
> Example: I use this command to recursively copy my source .xquery file
>  tree to the modules DB
>
>
>
>
>
>    ml:put -r -baseuri /App/ -maxfiles 10 -maxthreads 3 *
>
>
>
>
>
>
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Mike Brevoort
> *Sent:* Friday, June 11, 2010 12:20 AM
> *To:* [email protected]
> *Subject:* [MarkLogic Dev General] Mac Webdav Client setting xqy files
> asbinary
>
>
>
> Hi,
>
>
>
> So I know that webdav clients always seem to have quirks and I've heard
> hearsay that the Mac webdav client has some problems when interfacing with
> MarkLogic, but....
>
>
>
> I have a modules database mounted via webdav on a mac. When I copy in an
> xquey file (test.xqy) via the native webdav client the content type of the
> file is being set to "binary" but if I use Cyberduck to move the file, it's
> being set to "text". When the type is set to binary, it fails to execute
>
>
>
>       <h1>500 Internal Server Error</h1>
>
>       <dl>
>
>         <dt> [1.0-ml]</dt>
>
>         <dd>XDMP-TEXTNODE: /ctd/article.xqy -- Server unable to build
> program from non-text document</dd>
>
>         <dt>in /poc/article.xqy, on line 13 [1.0-ml]</dt>
>
>         <dd>XDMP-UNDFUN: (err:XPST0017) Undefined function
> comoms-article:getFields()</dd>
>
>         <dt>in /poc/article.xqy, on line 15 [1.0-ml]</dt>
>
>         <dd>XDMP-UNDFUN: (err:XPST0017) Undefined function
> comoms-article:get()</dd>
>
>         <dt>in /poc/article.xqy, on line 19 [1.0-ml]</dt>
>
>         <dd>XDMP-UNDFUN: (err:XPST0017) Undefined function
> comoms-article:post()</dd>
>
>       </dl>
>
>
>
> So two questions, anything I can do to affect how the Mac client/MarkLogic
> deal with document types? Or if not, how can I convert the document type via
> xquery? I'd really like to have the modules database mountable so that I can
> use tools like rsync to move files (vs a client like Cyberduck).
>
>
>
> Thanks!
>
> Mike
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
>
> --
> Mike Brevoort /  Enterprise Web Practice Manager /  Avalon Consulting LLC /
>  303-834-7509 /  twitter:mbrevoort
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
>
> --
> Mike Brevoort /  Enterprise Web Practice Manager /  Avalon Consulting LLC /
>  303-834-7509 /  twitter:mbrevoort
>
>
>
> _______________________________________________
> General mailing 
> [email protected]http://developer.marklogic.com/mailman/listinfo/general
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>


-- 
Mike Brevoort /  Enterprise Web Practice Manager /  Avalon Consulting LLC /
 303-834-7509 /  twitter:mbrevoort
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to