Re: [basex-talk] BaseX Sharding with ActiveMQ & Docker

Marco Lettere Fri, 05 Jan 2018 03:47:36 -0800

Hi James,
as to my experience the approach works.

I've tried to use N Slaves BaseX (DB only) instances coordinated by oneMaster BaseX including HTTP server. On the latter I implemented a RestXQ[1] receiving a HTTP request which was then turned into a XQuery andexecuted on the different backend instances through the client module[2]. I have been able to achieve a scalability of nearly 80% on someparticular queries.The results were encouraging indeed but a few details related toparallel computing in general have to be considered.1) If the query is memory only then you can exploit multi-cores toactually achieve parallelism. If the query has to heavily access DBs ondisk (most of the cases actually are io-bound) then you should ensurethat your distributed Slaves access different disks or reside ondifferent PCs otherwise all the gain of parallel computation is lost byoverhead on disk access.2) Uploading of documents from a single source usually doesn't exploitdata-parallelism because all the data has to be sequentialized through adistributor node first. So if you are able to generate the data from thebeginning in a distributed way, this could improve the distribution steptoo.3) Another thing to keep in mind is that you should try to perform asmuch of the work on the Slaves (for instance query and transformation)and limit the work on the coordinator to a bare minimum of computing andmemory usage in order to avoid the coordinator to be flooded with theresults produced by all your slaves. I achieved this by injectingfunctions as external variables in the XQ to be executed on the Slaves.

Hope this is useful for your work and I'm really looking forward to knowhow it will proceed!

Regards,
Marco.

[1] http://docs.basex.org/wiki/RESTXQ
[2] http://docs.basex.org/wiki/Client_Module

On 03/01/2018 19:50, James Sears wrote:

In the past I hit a scalability limit with BaseX - a billion+ nodeskind of a made querying it a bit slower than I liked.
I thought I'd try and address this, so I’ve written some code andplaced it in GitHub: https://github.com/jameshnsears/xqa-documentation
What I've done is proof of concept, that's all - no way "finished".I'm emailing the list in the hope that what I've done so far mightgenerate some constructive criticism. Maybe my approach has potential,maybe it doesn't?
There are only four components so far, the first three are Dockercontainers:
* an ActiveMQ instance

* a load balancer

* a shard
* a command line client exists to load the XML, from file, into anActiveMQ queue.
The software requires close to zero configuration. For example, eachshard you start will automatically receive XML from the load balancer.And the load balancer distributes XML so that each shard holds thesame # of documents.
There's a Travis project associated with the above link - it shows howeasy it is to run the software end to end.
So far my effort is all about ingesting the XML, before I move furtherI thought I'd canvass some feedback - so if anyone has any then pleasegive it :-)
Thanks.

Re: [basex-talk] BaseX Sharding with ActiveMQ & Docker

Reply via email to