Hi Andreas,

Interesting slides, good find!

If you are talking about more ad hoc processing, you could look into things 
like https://github.com/mblakele/taskbot, and 
https://github.com/marklogic/corb2. These are tools that can batch up the work 
very well. They won’t spread load across a cluster automatically though. You 
could however try to split the load somehow, and run multiple instances in 
parallel, each against a different host. Though, that works best if you are 
targeting the host that actually holds the data you want to touch. But that is 
difficult. MLCP does that with its -fastload option. Would MLCP copy feature 
with a transform perhaps work?

MarkLogic also provides Hadoop integration, so maybe that is also worth looking 
at?

Cheers,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Andreas Hubmer 
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Thursday, July 30, 2015 at 8:56 AM
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] Distributing Tasks

Hi Geert,

Thanks for the update.

Triggers and the CPF aren't exactly what I'm looking for. What I want to do is 
to distribute one-time tasks like adding new elements to all existing documents.

I've found some slides 
<http://developer.marklogic.com/media/mlw12/Distributed-Content-Processing-in-MarkLogic.pdf>
 from a ML consultant on "Distributed Content Processing in MarkLogic" but the 
code builds on ML 4.

Probably I'll create a lightweight library myself. Either using one-time 
scheduled tasks or an HTTP server for distributing the tasks.

Regards,
Andreas

2015-07-29 17:56 GMT+02:00 Geert Josten 
<[email protected]<mailto:[email protected]>>:
Hi Andreas,

I haven’t heard about anything in this direction recently, but FWIW I added a 
+1 to the RFE.

Could post-commit triggers, or CPF help out in some way? They should run on the 
host that holds the forest that holds the document at hand from what I have 
heard..

Cheers,
Geert


From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Andreas Hubmer 
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, July 28, 2015 at 5:20 PM
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] Distributing Tasks

Hello,

In this Knowledgebase 
article<https://help.marklogic.com/knowledgebase/article/View/112/0/techniques-for-dividing-tasks-between-hosts-in-a-cluster>
 there is talk about an RFE (2763) that would make it possible to pass in 
options into xdmp:spawn() to allow the execution of code on a specific host in 
a cluster.
Are there still any plans for this feature?

Thanks and cheers,
Andreas

--
Andreas Hubmer
IT Consultant


_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general




--
Andreas Hubmer
IT Consultant

EBCONT enterprise technologies GmbH
Millennium Tower
Handelskai 94-96
A-1200 Vienna

Mobile: +43 664 60651861
Fax: +43 2772 512 69-9
Email: [email protected]<mailto:[email protected]>
Web: http://www.ebcont.com

OUR TEAM IS YOUR SUCCESS

UID-Nr. ATU68135644
HG St.Pölten - FN 399978 d
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to