Hello Xavier Masson,
Thanks for asking the tough questions and I will try to answer as best I can inline. >>We are currently evaluating ML, first as a XML database supporting XQuery. We >>already have one and a complex but not large set of data and would like to >>benefit from ML rich features. >>ML seems an amazing database and functionally bring a lot to the table. >> >>The recent communications around ML 8 and the evolutions in the recent >>versions are making me question the support of standard XQuery. >>More precisely the absence of support of XQuery 3.0 beyond the prefix is a >>bit puzzling (I am thinking mostly about group_by clauses). >>Said otherwise this post http://markmail.org/message/fpsimswbt3gteooj >>from 2012 seems to still be relevant and did not get any "answer" ;). In fact MarkLogic supports many of the functionalities in XQuery 3.0 including Supported: * try/catch expressions (3.15 Try/Catch Expressions<http://www.w3.org/TR/xquery-30/#id-try-catch>). Actually supported well before the standard was ratified * Dynamic function call (3.2.2 Dynamic Function Call<http://www.w3.org/TR/xquery-30/#id-dynamic-function-invocation> ) * Inline function expressions (3.1.7 Inline Function Expressions<http://www.w3.org/TR/xquery-30/#id-inline-func>). * Private functions (4.18 Function Declaration<http://www.w3.org/TR/xquery-30/#FunctionDeclns>). * Switch expressions (3.13 Switch Expression<http://www.w3.org/TR/xquery-30/#id-switch>). * Computed namespace constructors (3.9.3.7 Computed Namespace Constructors<http://www.w3.org/TR/xquery-30/#id-computed-namespaces>). * Output declarations (2.2.4 Serialization<http://www.w3.org/TR/xquery-30/#id-serialization>). * Annotations (4.15 Annotations<http://www.w3.org/TR/xquery-30/#id-annotations>). * Function assertions<http://www.w3.org/TR/xquery-30/#dt-function-assertion> in function tests<http://www.w3.org/TR/xquery-30/#doc-xquery30-FunctionTest>. * A string concatenation operator (3.6 String Concatenation Expressions<http://www.w3.org/TR/xquery-30/#id-string-concat-expr>). * A mapping operator (3.17 Simple map operator (!)<http://www.w3.org/TR/xquery-30/#id-map-operator>). What is not supported and I will shed light on why * group by clause in FLWOR Expressions (3.10.7 Group By Clause<http://www.w3.org/TR/xquery-30/#id-group-by>). * count clause in FLWOR Expressions (3.10.6 Count Clause<http://www.w3.org/TR/xquery-30/#id-count>). * tumbling window and sliding window in FLWOR Expressions (3.10.4 Window Clause<http://www.w3.org/TR/xquery-30/#id-windows>). * allowing empty in 3.10.2 For Clause<http://www.w3.org/TR/xquery-30/#id-xquery-for-clause>, for functionality similar to outer joins in SQL. Group by/count: Understand that the "group by"/count clause, implementation is resource intensive operation and not something that should be supported loosely unless there is a clear way to scale it as a database operation. The engine has some support for group-by operations using cts:value-tuples, but requires indexes to ensure optimal operation in MarkLogic. The operation to perform a group by across a large dataset would run into scaling issues without the proper index structures to support it. There may be some future support for this as SPARQL gains this ability to support group by. But the indexing structures would need to be more aligned with a database vs how MarkLogic indexes content for search and retrieval. Tumbling/Sliding Window /allowing empty For the other features not mentioned, those are generally less known and although very useful for small cases, not primarily on the radar from a necessity/use case perspective. Just my 2 cents. >I fully understand and appreciate that ML needs to "move forward into >buzzland" from JSON (good) to haddoop and BIgData (who really has a truly 'big >data' dataset ? ;) ), even throwing that abomination that is javascript into >the mix ( ;) >) but does that mean that XML and XQUERY support will stop >evolving and be "engine level /internals stuff" >(because I understand that ML is still at its score a XML/document database ) >? Yes, Javascript is like putting bumper stickers on a Bentley, but allows the bumper sticker citizens to play with a powerful engine and technology. The fact we use V8 shows that we have great performance for javascript to leverage MarkLogic, if you so choose. But what it does do is open up the integration of complex algorithms already supported in javascript, that would not be available to XQuery as a community. >>2) >>I am also a bit troubled by the constant "cts/xdmp:hack" that seems to be the >>only way of getting performance out of ML. >>I know that this kind of "custom methods" is supported by the W3C specs but >>my problem is that it seems to be the complete and constant substitute for >>FLOWER. >>I can understand such an escape hatch for specific optimizations (the last >>20% perf ;)) or to access functionality outside of the standard >>XML/XQuery/XPATH specs (like state injection or the temporal stuff or the >>rest stuff) but it really >>seems like it goes way beyond that and that, in >>fact, ML support its own query langage based on an XQuery like syntax, much >>more than XQuery with some extensions. :) Its very hard to optimize XQuery to support all the nuances of composition. The libraries for cts/xdmp are expansive, but I would not perceive this as a hack, but extensions that are optimized to support database operations, not xquery itself. Also, I often ask myself, what features of the standard are actually useful and what are just nice to have. For the XQuery update facilities and is still a candidate recommendation, and I find the syntax to be less friendly than using the xdmp: namespace functions, but that is probably how much time I spent using them. >>Is this a wrong impression and is it possible to use ML with good >>performance with "standard" XQuery. ? I don't think it's the wrong impression, quite in-line with some of my previous thinking and wanting to see support in XQuery 3.0 also. But I weigh this vs what I don't get in the XQuery spec that is only available in MarkLogic and its okay I settle a bit. The mere fact that MarkLogic has decided to support other languages makes it uses more acceptable to the non-xquery developer, but I agree I do want to see us continue support for new features in the spec and beyond. >>I am not just being "theorical" there, we do have XML data and a previous >>database and existing queries, and we intend to publish query abilities "to >>the world ('company' wide and worldwide) " in some specific cases. It is not >>feasible or >>acceptable for us to have to impose the ML query language and >>I really don't like the idea of rewriting most of my queries around >>proprietary stuff and have to expose that stuff outside the application >>internals. >>To sum it up: >>1) is xquery support stuck in its current state (mostly 1.0) >>2) is the XML support by ML still "first party" >>3) is is possible to have good performance using standard XQuery syntax for >>requests. 1. No, but it will come by virtue of people like you pushing it to stay alive. 2. XML will always be the de-facto standard of MarkLogic. But we have given first-class citizenship to other data types and languages (JSON, SPARQL, Triples). This is the world we are living in, so we have to play nice outside of XML and XQuery. The advantage of still being able to interoperate with Javascript (a lot of algorithms already implemented) and triple(Deep relationship analysis) should be welcome. 3. Yes, it is but you have to understand deeply how MarkLogic optimizes XQuery under the hood and what are best practices for writing performant XQuery code. Sometimes it does require extensions outsided of xquery, sometimes it requires indexes on the database to support using fn functions in xpath efficiently. My recommendation is use look into xdmp:plan and profiling tools. >>Thanks a lot for your time. >>PS : sorry for posting this on a "dev" mailling list but there does not >>seem to be a "general" mailling list, and I trust fellow developers more >>than anyone to tell me the reality (even the bitter one ) of such things. Hope this helps or at least still keeps you interested. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of [email protected] Sent: Thursday, January 08, 2015 9:17 AM To: [email protected] Subject: General Digest, Vol 127, Issue 12 Send General mailing list submissions to [email protected]<mailto:[email protected]> To subscribe or unsubscribe via the World Wide Web, visit http://developer.marklogic.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to [email protected]<mailto:[email protected]> You can reach the person managing the list at [email protected]<mailto:[email protected]> When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. XDMP:http-get and 304 responses (Chris Hudson-Silver) 2. Re: XDMP:http-get and 304 responses (Geert Josten) 3. What about XQuery proper ? :) (Xavier Masson) ---------------------------------------------------------------------- Message: 1 Date: Thu, 8 Jan 2015 10:12:02 +0000 From: Chris Hudson-Silver <[email protected]> Subject: [MarkLogic Dev General] XDMP:http-get and 304 responses To: "[email protected]" <[email protected]> Message-ID: <am3pr07mb321e60424e67f70ed27ad88a6...@am3pr07mb321.eurprd07.prod.outlook.com> Content-Type: text/plain; charset="us-ascii" Hi All, Recently I was working on a project that tracks a repository by calling a REST webservice that returns back metadata and download URLS for items that have changed in the remote repository since the last call. It then checks to see if the item has already been downloaded and if so will call the download URL with the HTTP cache headers set as the modification could have been just metadata not content. E.g: let $options := <options xmlns="xdmp:http"><headers><If-Modified-Since>Fri, 21 Nov 2014 16:53:12 GMT</If-Modified-Since><If-None-Match>"1416588792000"</If-None-Match></headers><repair xmlns="xdmp:document-get">full</repair></options> let $response := xdmp:http-get($url, $options) I noticed that the run time for this was considerably longer if some of the items would return back a Not Modified 304 response so decided to test if it was the remote repository or MarkLogic adding the overhead. I did this by creating a script that generated CURL commands so I could do the exact same requests from the command line and MarkLogic. The calls back to the command line and Marklogic were returning the exact same response including the correct 304 code and an empty response body. The calls from the command line were taking about 20 seconds less time than the calls from MarkLogic and seeing how the global timeout was set to 20 seconds I decided to try the MarkLogic calls but with a 1 second time out e.g: let $options := <options xmlns="xdmp:http"><timeout>1</timeout><headers><If-Modified-Since>Fri, 21 Nov 2014 16:53:12 GMT</If-Modified-Since><If-None-Match>"1416588792000"</If-None-Match></headers><repair xmlns="xdmp:document-get">full</repair></options> and now they are taking approximately 1 second longer than the calls from the command line. Has anyone else encountered this? Could it be that MarkLogic is waiting for the response body even though it has received a valid response header? The HTTP 1.1 standard states that a response does not necessarily need a response body so if this is the case it maybe a bug in MarkLogics HTTP module. Or am I missing something vital in my request options? Thanks in advance, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20150108/3657ae29/attachment-0001.html ------------------------------ Message: 2 Date: Thu, 8 Jan 2015 10:43:33 +0000 From: Geert Josten <[email protected]> Subject: Re: [MarkLogic Dev General] XDMP:http-get and 304 responses To: MarkLogic Developer Discussion <[email protected]> Message-ID: <d0d41d71.469b2%[email protected]> Content-Type: text/plain; charset="us-ascii" Hi Chris, Does the response contain a Content-Length? If not, maybe MarkLogic waits the full timeout before it decides there is none. If it has one (with a value of zero), that might be a bug.. Kind regards, Geert From: Chris Hudson-Silver <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Thursday, January 8, 2015 at 11:12 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [MarkLogic Dev General] XDMP:http-get and 304 responses Hi All, Recently I was working on a project that tracks a repository by calling a REST webservice that returns back metadata and download URLS for items that have changed in the remote repository since the last call. It then checks to see if the item has already been downloaded and if so will call the download URL with the HTTP cache headers set as the modification could have been just metadata not content. E.g: let $options := <options xmlns="xdmp:http"><headers><If-Modified-Since>Fri, 21 Nov 2014 16:53:12 GMT</If-Modified-Since><If-None-Match>"1416588792000"</If-None-Match></headers><repair xmlns="xdmp:document-get">full</repair></options> let $response := xdmp:http-get($url, $options) I noticed that the run time for this was considerably longer if some of the items would return back a Not Modified 304 response so decided to test if it was the remote repository or MarkLogic adding the overhead. I did this by creating a script that generated CURL commands so I could do the exact same requests from the command line and MarkLogic. The calls back to the command line and Marklogic were returning the exact same response including the correct 304 code and an empty response body. The calls from the command line were taking about 20 seconds less time than the calls from MarkLogic and seeing how the global timeout was set to 20 seconds I decided to try the MarkLogic calls but with a 1 second time out e.g: let $options := <options xmlns="xdmp:http"><timeout>1</timeout><headers><If-Modified-Since>Fri, 21 Nov 2014 16:53:12 GMT</If-Modified-Since><If-None-Match>"1416588792000"</If-None-Match></headers><repair xmlns="xdmp:document-get">full</repair></options> and now they are taking approximately 1 second longer than the calls from the command line. Has anyone else encountered this? Could it be that MarkLogic is waiting for the response body even though it has received a valid response header? The HTTP 1.1 standard states that a response does not necessarily need a response body so if this is the case it maybe a bug in MarkLogics HTTP module. Or am I missing something vital in my request options? Thanks in advance, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20150108/03cd9023/attachment-0001.html ------------------------------ Message: 3 Date: Thu, 8 Jan 2015 15:17:06 +0100 From: Xavier Masson <[email protected]> Subject: [MarkLogic Dev General] What about XQuery proper ? :) To: <[email protected].> Message-ID: <[email protected]> Content-Type: text/plain; charset="utf-8" Hello We are currently evaluating ML, first as a XML database supporting XQuery. We already have one and a complex but not large set of data and would like to benefit from ML rich features. ML seems an amazing database and functionally bring a lot to the table. But I have two main concerns : 1) The recent communications around ML 8 and the evolutions in the recent versions are making me question the support of standard XQuery. More precisely the absence of support of XQuery 3.0 beyond the prefix is a bit puzzling (I am thinking mostly about group_by clauses). Said otherwise this post http://markmail.org/message/fpsimswbt3gteooj from 2012 seems to still be relevant and did not get any "answer" ;). I fully understand and appreciate that ML needs to "move forward into buzzland" from JSON (good) to haddoop and BIgData (who really has a truly 'big data' dataset ? ;) ), even throwing that abomination that is javascript into the mix ( ;) ) but does that mean that XML and XQUERY support will stop evolving and be "engine level /internals stuff" (because I understand that ML is still at its score a XML/document database ) ? 2) I am also a bit troubled by the constant "cts/xdmp:hack" that seems to be the only way of getting performance out of ML. I know that this kind of "custom methods" is supported by the W3C specs but my problem is that it seems to be the complete and constant substitute for FLOWER. I can understand such an escape hatch for specific optimizations (the last 20% perf ;)) or to access functionality outside of the standard XML/XQuery/XPATH specs (like state injection or the temporal stuff or the rest stuff) but it really seems like it goes way beyond that and that, in fact, ML support its own query langage based on an XQuery like syntax, much more than XQuery with some extensions. :) Is this a wrong impression and is it possible to use ML with good performance with "standard" XQuery. ? I am not just being "theorical" there, we do have XML data and a previous database and existing queries, and we intend to publish query abilities "to the world ('company' wide and worldwide) " in some specific cases. It is not feasible or acceptable for us to have to impose the ML query language and I really don't like the idea of rewriting most of my queries around proprietary stuff and have to expose that stuff outside the application internals. To sum it up: 1) is xquery support stuck in its current state (mostly 1.0) 2) is the XML support by ML still "first party" 3) is is possible to have good performance using standard XQuery syntax for requests. Thanks a lot for your time. PS : sorry for posting this on a "dev" mailling list but there does not seem to be a "general" mailling list, and I trust fellow developers more than anyone to tell me the reality (even the bitter one ) of such things. ________________________________________________________________ Ce message, ainsi que les pi?ces jointes, sont ?tablis, sous la seule responsabilit? de l'exp?diteur, ? l'intention exclusive de ses destinataires ; ils peuvent contenir des informations confidentielles. Toute publication, utilisation ou diffusion doit ?tre autoris?e pr?alablement. Ce message a fait l'objet d'un traitement anti-virus. Il est rappel? que tout message ?lectronique est susceptible d'alt?ration au cours de son acheminement sur Internet. ________________________________________________________________ Vous pouvez consulter le site de l'Assembl?e nationale ? l'adresse suivante : http://www.assemblee-nationale.fr -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20150108/0e8c1aa0/attachment.html ------------------------------ _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general End of General Digest, Vol 127, Issue 12 ****************************************
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
