[MarkLogic Dev General] What about XQuery proper?

Gary Vidal Thu, 08 Jan 2015 07:31:28 -0800

Hello Xavier Masson,



Thanks for asking the tough questions and I will try to answer as best I can 
inline.





>>We are currently evaluating ML, first as a XML database supporting XQuery. We 
>>already have one and a complex but not large set of data and would like to 
>>benefit from ML rich features.

>>ML seems an amazing database and functionally bring a lot to the table.

>>

>>The recent communications around ML 8 and the evolutions in the recent 
>>versions are making me question the support of standard XQuery.

>>More precisely the absence of support of XQuery 3.0 beyond the prefix is a 
>>bit puzzling (I am thinking mostly about group_by clauses).

>>Said otherwise this post http://markmail.org/message/fpsimswbt3gteooj

>>from 2012 seems to still be relevant and did not get any "answer" ;).



In fact MarkLogic supports many of the functionalities in XQuery 3.0 including

Supported:



*  try/catch expressions (3.15 Try/Catch 
Expressions<http://www.w3.org/TR/xquery-30/#id-try-catch>).

Actually supported well before the standard was ratified

*  Dynamic function call (3.2.2 Dynamic Function 
Call<http://www.w3.org/TR/xquery-30/#id-dynamic-function-invocation> )

*  Inline function expressions (3.1.7 Inline Function 
Expressions<http://www.w3.org/TR/xquery-30/#id-inline-func>).

*  Private functions (4.18 Function 
Declaration<http://www.w3.org/TR/xquery-30/#FunctionDeclns>).

*  Switch expressions (3.13 Switch 
Expression<http://www.w3.org/TR/xquery-30/#id-switch>).

*  Computed namespace constructors (3.9.3.7 Computed Namespace 
Constructors<http://www.w3.org/TR/xquery-30/#id-computed-namespaces>).

*  Output declarations (2.2.4 
Serialization<http://www.w3.org/TR/xquery-30/#id-serialization>).

*  Annotations (4.15 
Annotations<http://www.w3.org/TR/xquery-30/#id-annotations>).

*  Function assertions<http://www.w3.org/TR/xquery-30/#dt-function-assertion> 
in function tests<http://www.w3.org/TR/xquery-30/#doc-xquery30-FunctionTest>.

*  A string concatenation operator (3.6 String Concatenation 
Expressions<http://www.w3.org/TR/xquery-30/#id-string-concat-expr>).

*  A mapping operator (3.17 Simple map operator 
(!)<http://www.w3.org/TR/xquery-30/#id-map-operator>).



What is not supported and I will shed light on why

*  group by clause in FLWOR Expressions (3.10.7 Group By 
Clause<http://www.w3.org/TR/xquery-30/#id-group-by>).

*  count clause in FLWOR Expressions (3.10.6 Count 
Clause<http://www.w3.org/TR/xquery-30/#id-count>).





*  tumbling window and sliding window in FLWOR Expressions (3.10.4 Window 
Clause<http://www.w3.org/TR/xquery-30/#id-windows>).

*  allowing empty in 3.10.2 For 
Clause<http://www.w3.org/TR/xquery-30/#id-xquery-for-clause>, for functionality 
similar to outer joins in SQL.



Group by/count:

Understand that the "group by"/count clause, implementation is resource 
intensive operation and not something that should be supported loosely unless 
there is a clear way to scale it as a database operation.  The engine has some 
support for group-by operations using cts:value-tuples, but requires indexes to 
ensure optimal operation in MarkLogic.  The operation to perform a group by 
across a large dataset would run into scaling issues without the proper index 
structures to support it.  There may be some future support for this as SPARQL 
gains this ability to support group by.  But the indexing structures would need 
to be more aligned with a database vs how MarkLogic indexes content for search 
and retrieval.



Tumbling/Sliding Window /allowing empty



For the other features not mentioned, those are generally less known and 
although very useful for small cases, not primarily on the radar from a 
necessity/use case perspective. Just my 2 cents.





>I fully understand and appreciate that ML needs to "move forward into 
>buzzland" from JSON (good) to haddoop and BIgData (who really has a truly 'big 
>data' dataset ? ;) ), even throwing that abomination that is javascript into 
>the mix ( ;) >) but does that mean that XML and XQUERY support will stop 
>evolving and be "engine level /internals stuff"

>(because I understand that ML is still at its score a XML/document database )  
>?



Yes, Javascript is like putting bumper stickers on a Bentley, but allows the 
bumper sticker citizens to play with a powerful engine and technology.  The 
fact we use V8 shows that we have great performance for javascript  to leverage 
MarkLogic, if you so choose.  But what it does do is open up the integration of 
complex algorithms already supported in javascript, that would not be available 
to XQuery as a community.



>>2)



>>I am also a bit troubled by the constant "cts/xdmp:hack" that seems to be the 
>>only way of getting performance out of ML.

>>I know that this kind of "custom methods" is supported by the W3C specs but 
>>my problem is that it seems to be the complete and constant substitute for 
>>FLOWER.

>>I can understand such an escape hatch for specific optimizations (the last 
>>20% perf ;)) or to access functionality outside of the standard 
>>XML/XQuery/XPATH specs (like state injection or the temporal stuff or the 
>>rest stuff) but it really >>seems like it goes way beyond that and that, in 
>>fact, ML support its own query langage based on an XQuery like syntax, much 
>>more than XQuery with some extensions. :)





Its very hard to optimize XQuery to support all the nuances of composition.  
The libraries for cts/xdmp are expansive, but I would not perceive this as a 
hack, but extensions that are optimized to support database operations, not 
xquery itself.  Also, I often ask myself, what features of the standard are 
actually useful and what are just nice to have.  For the XQuery update 
facilities and is still a candidate recommendation, and I find the syntax to be 
less friendly than using the xdmp: namespace functions, but that is probably 
how much time I spent using them.



>>Is this a  wrong impression and is it possible to use ML with good 
>>performance with "standard" XQuery. ?



I don't think it's the wrong impression, quite in-line with some of my previous 
thinking and wanting to see support in XQuery 3.0 also.  But I weigh this vs 
what I don't get in the XQuery spec that is only available in MarkLogic and its 
okay I settle a bit.  The mere fact that MarkLogic has decided to support other 
languages makes it uses more acceptable to the non-xquery developer, but I 
agree I do want to see us continue support for new features in the spec and 
beyond.





>>I am not just being "theorical" there, we do have XML data and a previous 
>>database and existing queries, and we intend to publish query abilities "to 
>>the world ('company' wide and worldwide) " in some specific cases. It is not 
>>feasible or >>acceptable for us to have to impose the ML query language  and 
>>I really don't like the idea of rewriting most of my queries around  
>>proprietary stuff and have to expose that stuff outside the application 
>>internals.



>>To sum it up:



>>1) is xquery support stuck  in its current state (mostly 1.0)

>>2) is the XML support by ML still "first party"

>>3) is is possible to have good performance using standard XQuery syntax for 
>>requests.



1.       No, but it will come by virtue of people like you pushing it to stay 
alive.

2.       XML will always be the de-facto standard of MarkLogic.  But we have 
given first-class citizenship to other data types and languages (JSON, SPARQL, 
Triples).  This is the world we are living in, so we have to play nice outside 
of XML and XQuery.  The advantage of still being able to interoperate with 
Javascript (a lot of algorithms already implemented) and triple(Deep 
relationship analysis) should be welcome.

3.       Yes, it is but you have to understand deeply how MarkLogic optimizes 
XQuery under the hood and what are best practices for writing performant XQuery 
code.  Sometimes it does require extensions outsided of xquery, sometimes it 
requires indexes on the database to support using fn functions in xpath 
efficiently.  My recommendation is use look into xdmp:plan and profiling tools.





>>Thanks a lot for your time.



>>PS : sorry for posting this on a "dev" mailling list but there does not

>>seem to be a "general" mailling list, and I trust fellow developers more

>>than anyone to tell me the reality (even the bitter one ) of such things.





Hope this helps or at least still keeps you interested.





-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of 
[email protected]
Sent: Thursday, January 08, 2015 9:17 AM
To: [email protected]
Subject: General Digest, Vol 127, Issue 12



Send General mailing list submissions to

                
[email protected]<mailto:[email protected]>



To subscribe or unsubscribe via the World Wide Web, visit

                http://developer.marklogic.com/mailman/listinfo/general

or, via email, send a message with subject or body 'help' to

                
[email protected]<mailto:[email protected]>



You can reach the person managing the list at

                
[email protected]<mailto:[email protected]>



When replying, please edit your Subject line so it is more specific than "Re: 
Contents of General digest..."





Today's Topics:



   1. XDMP:http-get and 304 responses (Chris Hudson-Silver)

   2. Re: XDMP:http-get and 304 responses (Geert Josten)

   3. What about XQuery proper ? :) (Xavier Masson)





----------------------------------------------------------------------



Message: 1

Date: Thu, 8 Jan 2015 10:12:02 +0000

From: Chris Hudson-Silver <[email protected]>

Subject: [MarkLogic Dev General] XDMP:http-get and 304 responses

To: "[email protected]"

                <[email protected]>

Message-ID:

                
<am3pr07mb321e60424e67f70ed27ad88a6...@am3pr07mb321.eurprd07.prod.outlook.com>



Content-Type: text/plain; charset="us-ascii"



Hi All,



Recently I was working on a project that tracks a repository by calling a REST 
webservice that returns back metadata and download URLS for items that have 
changed in the remote repository since the last call. It then checks to see if 
the item has already been downloaded and if so will call the download URL with 
the HTTP cache headers set as the modification could have been just metadata 
not content. E.g:



let $options := <options xmlns="xdmp:http"><headers><If-Modified-Since>Fri, 21 
Nov 2014 16:53:12 
GMT</If-Modified-Since><If-None-Match>"1416588792000"</If-None-Match></headers><repair
 xmlns="xdmp:document-get">full</repair></options>

let $response := xdmp:http-get($url, $options)



I noticed that the run time for this was considerably longer if some of the 
items would return back a Not Modified 304 response so decided to test if it 
was the remote repository or MarkLogic adding the overhead. I did this by 
creating a script that generated CURL commands so I could do the exact same 
requests from the command line and MarkLogic.

The calls back to the command line and Marklogic were returning the exact same 
response including the correct 304 code and an empty response body.

The calls from the command line were taking about 20 seconds less time than the 
calls from MarkLogic and seeing how the global timeout was set to 20 seconds I 
decided to try the MarkLogic calls but with a 1 second time out e.g:





let $options := <options 
xmlns="xdmp:http"><timeout>1</timeout><headers><If-Modified-Since>Fri, 21 Nov 
2014 16:53:12 
GMT</If-Modified-Since><If-None-Match>"1416588792000"</If-None-Match></headers><repair
 xmlns="xdmp:document-get">full</repair></options>



and now they are taking approximately 1 second longer than the calls from the 
command line.



Has anyone else encountered this?

Could it be that MarkLogic is waiting for the response body even though it has 
received a valid response header? The HTTP 1.1 standard states that a response 
does not necessarily need a response body so if this is the case it maybe a bug 
in MarkLogics HTTP module.

Or am I missing something vital in my request options?



Thanks in advance,



Chris



-------------- next part --------------

An HTML attachment was scrubbed...

URL: 
http://developer.marklogic.com/pipermail/general/attachments/20150108/3657ae29/attachment-0001.html



------------------------------



Message: 2

Date: Thu, 8 Jan 2015 10:43:33 +0000

From: Geert Josten <[email protected]>

Subject: Re: [MarkLogic Dev General] XDMP:http-get and 304 responses

To: MarkLogic Developer Discussion <[email protected]>

Message-ID: <d0d41d71.469b2%[email protected]>

Content-Type: text/plain; charset="us-ascii"



Hi Chris,



Does the response contain a Content-Length? If not, maybe MarkLogic waits the 
full timeout before it decides there is none. If it has one (with a value of 
zero), that might be a bug..



Kind regards,

Geert



From: Chris Hudson-Silver 
<[email protected]<mailto:[email protected]>>

Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>

Date: Thursday, January 8, 2015 at 11:12 AM

To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>

Subject: [MarkLogic Dev General] XDMP:http-get and 304 responses



Hi All,



Recently I was working on a project that tracks a repository by calling a REST 
webservice that returns back metadata and download URLS for items that have 
changed in the remote repository since the last call. It then checks to see if 
the item has already been downloaded and if so will call the download URL with 
the HTTP cache headers set as the modification could have been just metadata 
not content. E.g:



let $options := <options xmlns="xdmp:http"><headers><If-Modified-Since>Fri, 21 
Nov 2014 16:53:12 
GMT</If-Modified-Since><If-None-Match>"1416588792000"</If-None-Match></headers><repair
 xmlns="xdmp:document-get">full</repair></options>

let $response := xdmp:http-get($url, $options)



I noticed that the run time for this was considerably longer if some of the 
items would return back a Not Modified 304 response so decided to test if it 
was the remote repository or MarkLogic adding the overhead. I did this by 
creating a script that generated CURL commands so I could do the exact same 
requests from the command line and MarkLogic.

The calls back to the command line and Marklogic were returning the exact same 
response including the correct 304 code and an empty response body.

The calls from the command line were taking about 20 seconds less time than the 
calls from MarkLogic and seeing how the global timeout was set to 20 seconds I 
decided to try the MarkLogic calls but with a 1 second time out e.g:





let $options := <options 
xmlns="xdmp:http"><timeout>1</timeout><headers><If-Modified-Since>Fri, 21 Nov 
2014 16:53:12 
GMT</If-Modified-Since><If-None-Match>"1416588792000"</If-None-Match></headers><repair
 xmlns="xdmp:document-get">full</repair></options>



and now they are taking approximately 1 second longer than the calls from the 
command line.



Has anyone else encountered this?

Could it be that MarkLogic is waiting for the response body even though it has 
received a valid response header? The HTTP 1.1 standard states that a response 
does not necessarily need a response body so if this is the case it maybe a bug 
in MarkLogics HTTP module.

Or am I missing something vital in my request options?



Thanks in advance,



Chris



-------------- next part --------------

An HTML attachment was scrubbed...

URL: 
http://developer.marklogic.com/pipermail/general/attachments/20150108/03cd9023/attachment-0001.html



------------------------------



Message: 3

Date: Thu, 8 Jan 2015 15:17:06 +0100

From: Xavier Masson <[email protected]>

Subject: [MarkLogic Dev General] What about XQuery proper ? :)

To: <[email protected].>

Message-ID: <[email protected]>

Content-Type: text/plain; charset="utf-8"



Hello



We are currently evaluating ML, first as a XML database supporting XQuery. We 
already have one and a complex but not large set of data and would like to 
benefit from ML rich features.

ML seems an amazing database and functionally bring a lot to the table.



But I have two main concerns :



1)



The recent communications around ML 8 and the evolutions in the recent versions 
are making me question the support of standard XQuery.

More precisely the absence of support of XQuery 3.0 beyond the prefix is a bit 
puzzling (I am thinking mostly about group_by clauses).

Said otherwise this post http://markmail.org/message/fpsimswbt3gteooj

from 2012 seems to still be relevant and did not get any "answer" ;).



I fully understand and appreciate that ML needs to "move forward into buzzland" 
from JSON (good) to haddoop and BIgData (who really has a truly 'big data' 
dataset ? ;) ), even throwing that abomination that is javascript into the mix 
( ;) ) but does that mean that XML and XQUERY support will stop evolving and be 
"engine level /internals stuff"

(because I understand that ML is still at its score a XML/document database )  ?





2)



I am also a bit troubled by the constant "cts/xdmp:hack" that seems to be the 
only way of getting performance out of ML.

I know that this kind of "custom methods" is supported by the W3C specs but my 
problem is that it seems to be the complete and constant substitute for FLOWER.

I can understand such an escape hatch for specific optimizations (the last 20% 
perf ;)) or to access functionality outside of the standard XML/XQuery/XPATH 
specs (like state injection or the temporal stuff or the rest stuff) but it 
really seems like it goes way beyond that and that, in fact, ML support its own 
query langage based on an XQuery like syntax, much more than XQuery with some 
extensions. :)



Is this a  wrong impression and is it possible to use ML with good performance 
with "standard" XQuery. ?



I am not just being "theorical" there, we do have XML data and a previous 
database and existing queries, and we intend to publish query abilities "to the 
world ('company' wide and worldwide) " in some specific cases. It is not 
feasible or acceptable for us to have to impose the ML query language  and I 
really don't like the idea of rewriting most of my queries around  proprietary 
stuff and have to expose that stuff outside the application internals.



To sum it up:



1) is xquery support stuck  in its current state (mostly 1.0)

2) is the XML support by ML still "first party"

3) is is possible to have good performance using standard XQuery syntax for 
requests.





Thanks a lot for your time.



PS : sorry for posting this on a "dev" mailling list but there does not

seem to be a "general" mailling list, and I trust fellow developers more

than anyone to tell me the reality (even the bitter one ) of such things.





________________________________________________________________



Ce message, ainsi que les pi?ces jointes, sont ?tablis, sous la

seule responsabilit? de l'exp?diteur, ? l'intention exclusive

de ses destinataires ; ils peuvent contenir des informations

confidentielles. Toute publication, utilisation ou diffusion

doit ?tre autoris?e pr?alablement.

Ce message a fait l'objet d'un traitement anti-virus.

Il est rappel? que tout message ?lectronique est susceptible

d'alt?ration au cours de son acheminement sur Internet.

________________________________________________________________



Vous pouvez consulter le site de l'Assembl?e nationale ?

l'adresse suivante : http://www.assemblee-nationale.fr

-------------- next part --------------

An HTML attachment was scrubbed...

URL: 
http://developer.marklogic.com/pipermail/general/attachments/20150108/0e8c1aa0/attachment.html



------------------------------



_______________________________________________

General mailing list

[email protected]<mailto:[email protected]>

http://developer.marklogic.com/mailman/listinfo/general





End of General Digest, Vol 127, Issue 12

****************************************

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] What about XQuery proper?

Reply via email to