Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation?

Eliot Kimber via BaseX-Talk Thu, 23 Jan 2025 11:55:51 -0800

I wanted to revisit this discussion in the new year.

In the context of a different internal initiative, I’ve been learning more 
about traditional web sites architecture and implementation (i.e., apache httpd 
plus statically generated sites using node.js).

This has gotten me to wondering whether the general architecture for a 
large-user-count, long-query-handling web site is to have one BaseX HTTP server 
to serve the web site and handle requests and a second server (basexserver) 
that does all the query work and accepts requests from the HTTP server?

That’s based on the assumption that a basexserver instance can handle a large 
number of concurrent requests as it must handle its own internal threading etc. 
This also presumes that a single server instance (one JVM) can use multiple 
cores on a multi-core server (my production server is an 8-CPU server). A 
little reading on jetty suggests that it should be able to handle the 
concurrent load I’m likely to have with no problem.

This would still require a mechanism to manage long-lived HTTP requests from 
the client, but there are various ways to handle that, including using web 
sockets to alert a client that a long request has completed and be more 
sophisticated with HTTP request details, as well as storing query results in a 
cache location from they are served back to the client.

It seems like my architectural mistake is having a BaseX HTTP server that both 
serves the interactive web site and makes queries and then trying to scale 
horizontally by having multiple HTTP servers to which requests are delegated by 
the primary server.

By having a single baseserver instance handling all queries, it can manage read 
and write locking appropriately.

Is this an appropriate architecture?

Thanks,

Eliot
_____________________________________________
Eliot Kimber
Sr. Staff Content Engineer
O: 512 554 9368

servicenow

servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
X<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Instagram<https://www.instagram.com/servicenow>

From: Eliot Kimber <eliot.kim...@servicenow.com>
Date: Thursday, December 12, 2024 at 4:40 PM
To: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app 
implementation?
In my approach, all updates are done to databases that are only used by my data 
loading HTTP server.

Basically, I create a set of temporary databases, load the new data into those, 
then, like Tamara, swap out the production (read-only) databases with the 
newly-created temp databases.

I’ve currently implemented this by doing:

1. Rename the production database to something unique (i.e., 
“_dropme_databasename”)
2. Rename the temp database to the production name
3. Drop what was the production database.

This is in the context of my multi-server approach, where all the updates of a 
given set of related databases are done by a single HTTP server (and thus a 
single JVM).

Cheers,

E.

_____________________________________________
Eliot Kimber
Sr. Staff Content Engineer
O: 512 554 9368

servicenow

servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
X<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Instagram<https://www.instagram.com/servicenow>

From: Lizzi, Vincent <vincent.li...@taylorandfrancis.com>
Date: Thursday, December 12, 2024 at 12:59 PM
To: Tamara Marnell <tmarn...@orbiscascade.org>, Eliot Kimber 
<eliot.kim...@servicenow.com>
Cc: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: RE: [basex-talk] Deeper discussion of BaseX client/server and web app 
implementation?
[External Email]

________________________________
Hello Eliot and Tamara,

I’ve observed what appears to be – though haven’t fully tested to isolate and 
confirm this – instances where a write operation such as db:create() blocks 
BaseX from serving other http requests -- which use db:list() and db:get() --  
until the write operation is finished.

On reading the description of lock detection here 
https://docs.basex.org/main/BaseX_10#compilation<https://docs.basex.org/main/BaseX_10#compilation>
 I’m now wondering if it might help to apply a naming convention to database 
names so that it’s possible to distinguish by name which databases are 
currently used for read vs write – although renaming databases might add other 
complexities.

Thanks,
Vincent

_____________________________________________
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>

Information Classification: General
From: BaseX-Talk <basex-talk-boun...@mailman.uni-konstanz.de> On Behalf Of 
Tamara Marnell
Sent: Thursday, December 12, 2024 12:56 PM
To: Eliot Kimber <eliot.kim...@servicenow.com>
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app 
implementation?

Hello Eliot,

I have only one BaseX instance, but to avoid the locking issue during large 
updates/optimizations, I have multiple copies of the databases. Updates are 
performed on "working" databases, and then I use db:copy to duplicate them to 
"production" databases for users on the front end to query. I haven't seen or 
heard of any problems with concurrent users on the public side when they're 
just reading from the production databases.

-Tamara

On Thu, Dec 12, 2024 at 6:53 AM Eliot Kimber 
<eliot.kim...@servicenow.com<mailto:eliot.kim...@servicenow.com>> wrote:
I fully understand the issue of time.

The Database Server page 
(https://docs.basex.org/12/Database_Server<https://docs.basex.org/12/Database_Server>)
 doesn’t really provide the details I’m looking for.

In particular, it’s not clear to me how a BaseX server would be used with an 
HTTP server in order to manage parallel query execution and ensure a responsive 
web site in the face of 100s of concurrent web users making 1000s of query 
requests. My current architecture handles this in terms of responsiveness and 
horizontal scaling, but as you say, it runs into issues with contention on 
locks for databases being updated.

I know other people have successfully implemented public-facing web sites with 
BaseX so I’m curious how they’ve done it—is the life cycle of their content 
such that updates are not much of an issue or are they doing something 
different? Am I missing some way to make a single BaseX server take advantage 
of all available cores? I understood a Java JVM as using a single core, but 
maybe my understanding is wrong?

It may be that BaseX as I’m using it is not the right way to do what I’m doing. 
For example, it might make more sense to implement the web site using a typical 
node.js and React system that then uses BaseX exclusively through a REST API. 
That still presents the problem of how to scale handling of queries but avoids 
any issues with the web site itself being responsive. My team is learning how 
to use node.js, next.js, and React for other projects so it’s something we 
could explore.

I could also explore using other database solutions for some or all of what I 
want to do. For example, maybe it makes more sense to put my where-used table 
into a key-value store (even Solr could work for this pretty easily) or a SQL 
database and reserve BaseX for doing the XML-aware data processing needed to 
construct the table and doing other XML- and text-aware queries. But that would 
still run into performance issues, where I’m looking for 10ms response times 
for doing lookups in the where-used table.

Or maybe I just need to do more caching of query results where the results are 
stable for a given content set.

I started this project without any particular plan and got a long way just 
building it as I went but now that I���m tasked with fixing a number of design 
and behavior issues with my initial approach, I need to make sure I really know 
what I’m doing and make the most appropriate implementation choices.

Thanks,

Eliot
_____________________________________________
Eliot Kimber
Sr. Staff Content Engineer
O: 512 554 9368

servicenow

servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
X<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Instagram<https://www.instagram.com/servicenow>

From: Christian Grün 
<christian.gr...@gmail.com<mailto:christian.gr...@gmail.com>>
Date: Thursday, December 12, 2024 at 5:11 AM
To: Eliot Kimber 
<eliot.kim...@servicenow.com<mailto:eliot.kim...@servicenow.com>>
Cc: 
basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de> 
<basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>>
Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app 
implementation?
[External Email]

________________________________
Hi Eliot,

Free time is a rare resource nowadays; just some quick feedback:

> I’ve done a read through of the current documentation at 
> https://docs.basex.org/<https://docs.basex.org/> and also reviewed what I 
> could find online and such. In the documentation I find a number of 
> references to the “client/server” architecture but I’m not finding any 
> particularly deep discussion of it, either in the docs or by searching on 
> i.e., “basex client server”.

The best entry point may be Getting Started → Database Server [1].

> When I started my Mirabel project I understood that the way to get 
> concurrency was to use multiple BaseX HTTP instances, which can make 
> concurrent read requests on a single set of databases.

That’s dangerous (and has always been problematic). If you use have concurrent 
operations, you should have one central HTTP instance. Otherwise, you might run 
into concurrency issues and locked databases, as multiple JVMs cannot share 
their information among each other [2].

It may be difficult to give profound answers on the remaining questions in a 
few lines. Maybe others can share their experiences.

Best,
Christian

[1] 
https://docs.basex.org/12/Getting_Started<https://docs.basex.org/12/Getting_Started>
[2] 
https://docs.basex.org/main/Startup#concurrent_operations<https://docs.basex.org/main/Startup#concurrent_operations>

--

Tamara Marnell
Program Manager, Systems
Orbis Cascade Alliance (orbiscascade.org<https://www.orbiscascade.org/>)
Pronouns: she/her/hers

Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation?

Reply via email to