Re: [MarkLogic Dev General] How to pull data out of marklogic quickly?

sai shanker Mon, 10 Oct 2016 15:21:12 -0700


Hello Mark,
please try using corb tool.the link to the documentation is belowCorb2 — 
MarkLogic Developer Community


uIn the selector xquery you can use cts:uris()
with your custom and query and then each transform xquery will work on one 
uriyou can retrieve the document and write all your xpaths , concatenate 
themand return the concatenated string.
in the configuration you can configure all the output to go to a file.this will 
be a lot faster.
Thanks,Sai. 
 
 

On Mon, Oct 10, 2016 at 2:43 PM, Mark Shanks<[email protected]> wrote:  
Hi, 
We have a need to pull large amounts of data out of marklogic as quickly as 
possible. I found that doing xquery searches like query-by-example were very 
slow. Using the cts functions led to a big speed increase. However, it isn't 
clear whether my current approach is the optimum, or whether there are other 
better alternatives. Unfortunately, while there is a lot of documentation 
describing many different ways of doing things in marklogic, there seems to be 
very little documentation describing what are the best or most efficient 
approaches (e.g., what if your goal is not only to run a query successfully, 
but to maximize its performance?). At present, I'm using the java api to pull 
documents. I'm using the theCall.xquery(query) function in Java to run custom 
xquery through the rest api. The xquery is as follows:
<data>for $x in 
cts:search(fn:doc(),cts:and-query((cts:element-value-query(xs:QName('Department'),
 'Sales'),
cts:element-range-query(xs:QName('Date'), '>', 
xs:date('2015-01-01')),cts:element-range-query(xs:QName('Date'), '<', 
xs:date('2015-01-03')),cts:not-query(cts:element-value-query(xs:QName('Date'), 
'NULL')))), 'unfiltered' , 0.0))), 'unfiltered' , 0.0)return 
fn:concat($x//Department,'|',$x//Total,'|',$x//Location'')}</data>
There are indexes on Date and Department. The xquery wraps all of the documents 
in the <data> tags and sends the results to the java program. It then strips 
the <data> tags and prints the results to text file.
I have found that you can run multiple threads in the java that request 
different "chunks" of the data by using the criterions of [1 to 1000000], 
[1000001 to 2000000], etc.
This approach is much faster than our original approach - 12 hours with 8 
threads, rather than 75 hours using query-by-example. However, it is not clear 
if this is the fastest way, or there are further optimizations or better 
approaches. For instance, when pulling the actual elements from the documents, 
I found that having them indexed made no different to performance. Is there a 
way of pulling from the indexes to improve performance? Is there a way to 
specify the elements you want in the cts:search that will improve performance? 
Is there a more efficient way to restrict the search range? Is there 
documentation describing the most efficient approaches to querying marklogic?
Thanks.

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] How to pull data out of marklogic quickly?

Reply via email to