Hi Eliot, A short reply, as so often: It is definitely an option to work with multiple BaseX instances and use one of them for delegating incoming requests. Other BaseX instances can e.g. be addressed with the Client Module [1]. If you have millions of requests per day, it can get recommendable to either disable logging (in case you have a proxy layer anyway) or use the recently added log filter to reduce the number of entries [2].
Hope this helps, Christian [1] https://docs.basex.org/main/Client_Functions [2] https://docs.basex.org/12/Options#logexclude Eliot Kimber via BaseX-Talk <basex-talk@mailman.uni-konstanz.de> schrieb am Do., 23. Jan. 2025, 20:55: > I wanted to revisit this discussion in the new year. > > In the context of a different internal initiative, I’ve been learning more > about traditional web sites architecture and implementation (i.e., apache > httpd plus statically generated sites using node.js). > > This has gotten me to wondering whether the general architecture for a > large-user-count, long-query-handling web site is to have one BaseX HTTP > server to serve the web site and handle requests and a second server > (basexserver) that does all the query work and accepts requests from the > HTTP server? > > > > That’s based on the assumption that a basexserver instance can handle a > large number of concurrent requests as it must handle its own internal > threading etc. This also presumes that a single server instance (one JVM) > can use multiple cores on a multi-core server (my production server is an > 8-CPU server). A little reading on jetty suggests that it should be able to > handle the concurrent load I’m likely to have with no problem. > > > > This would still require a mechanism to manage long-lived HTTP requests > from the client, but there are various ways to handle that, including using > web sockets to alert a client that a long request has completed and be more > sophisticated with HTTP request details, as well as storing query results > in a cache location from they are served back to the client. > > > > It seems like my architectural mistake is having a BaseX HTTP server that > both serves the interactive web site and makes queries and then trying to > scale horizontally by having multiple HTTP servers to which requests are > delegated by the primary server. > > > > By having a single baseserver instance handling all queries, it can manage > read and write locking appropriately. > > > > Is this an appropriate architecture? > > > > Thanks, > > > > Eliot > > _____________________________________________ > > *Eliot Kimber* > > Sr. Staff Content Engineer > > O: 512 554 9368 > > > > *servicenow* > > > > servicenow.com <https://www.servicenow.com> > > LinkedIn <https://www.linkedin.com/company/servicenow> | X > <https://twitter.com/servicenow> | YouTube > <https://www.youtube.com/user/servicenowinc> | Instagram > <https://www.instagram.com/servicenow> > > > > *From: *Eliot Kimber <eliot.kim...@servicenow.com> > *Date: *Thursday, December 12, 2024 at 4:40 PM > *To: *basex-talk@mailman.uni-konstanz.de < > basex-talk@mailman.uni-konstanz.de> > *Subject: *Re: [basex-talk] Deeper discussion of BaseX client/server and > web app implementation? > > In my approach, all updates are done to databases that are only used by my > data loading HTTP server. > > Basically, I create a set of temporary databases, load the new data into > those, then, like Tamara, swap out the production (read-only) databases > with the newly-created temp databases. > > I’ve currently implemented this by doing: > > 1. Rename the production database to something unique (i.e., > “_dropme_databasename”) > > 2. Rename the temp database to the production name > > 3. Drop what was the production database. > > > > This is in the context of my multi-server approach, where all the updates > of a given set of related databases are done by a single HTTP server (and > thus a single JVM). > > > > Cheers, > > > > E. > > > > _____________________________________________ > > *Eliot Kimber* > > Sr. Staff Content Engineer > > O: 512 554 9368 > > > > *servicenow* > > > > servicenow.com <https://www.servicenow.com> > > LinkedIn <https://www.linkedin.com/company/servicenow> | X > <https://twitter.com/servicenow> | YouTube > <https://www.youtube.com/user/servicenowinc> | Instagram > <https://www.instagram.com/servicenow> > > > > *From: *Lizzi, Vincent <vincent.li...@taylorandfrancis.com> > *Date: *Thursday, December 12, 2024 at 12:59 PM > *To: *Tamara Marnell <tmarn...@orbiscascade.org>, Eliot Kimber < > eliot.kim...@servicenow.com> > *Cc: *basex-talk@mailman.uni-konstanz.de < > basex-talk@mailman.uni-konstanz.de> > *Subject: *RE: [basex-talk] Deeper discussion of BaseX client/server and > web app implementation? > *[External Email]* > > > ------------------------------ > > Hello Eliot and Tamara, > > > > I’ve observed what appears to be – though haven’t fully tested to isolate > and confirm this – instances where a write operation such as db:create() > blocks BaseX from serving other http requests -- which use db:list() and > db:get() -- until the write operation is finished. > > > > On reading the description of lock detection here > https://docs.basex.org/main/BaseX_10#compilation I’m now wondering if it > might help to apply a naming convention to database names so that it’s > possible to distinguish by name which databases are currently used for read > vs write – although renaming databases might add other complexities. > > > > Thanks, > > Vincent > > > > > > _____________________________________________ > > *Vincent M. Lizzi* > > Head of Information Standards | Taylor & Francis Group > > vincent.li...@taylorandfrancis.com > > > > > > Information Classification: General > > *From:* BaseX-Talk <basex-talk-boun...@mailman.uni-konstanz.de> *On > Behalf Of *Tamara Marnell > *Sent:* Thursday, December 12, 2024 12:56 PM > *To:* Eliot Kimber <eliot.kim...@servicenow.com> > *Cc:* basex-talk@mailman.uni-konstanz.de > *Subject:* Re: [basex-talk] Deeper discussion of BaseX client/server and > web app implementation? > > > > Hello Eliot, > > > > I have only one BaseX instance, but to avoid the locking issue during > large updates/optimizations, I have multiple copies of the databases. > Updates are performed on "working" databases, and then I use db:copy to > duplicate them to "production" databases for users on the front end to > query. I haven't seen or heard of any problems with concurrent users on the > public side when they're just reading from the production databases. > > > > -Tamara > > > > On Thu, Dec 12, 2024 at 6:53 AM Eliot Kimber <eliot.kim...@servicenow.com> > wrote: > > I fully understand the issue of time. > > > > The Database Server page (https://docs.basex.org/12/Database_Server) > doesn’t really provide the details I’m looking for. > > In particular, it’s not clear to me how a BaseX server would be used with > an HTTP server in order to manage parallel query execution and ensure a > responsive web site in the face of 100s of concurrent web users making > 1000s of query requests. My current architecture handles this in terms of > responsiveness and horizontal scaling, but as you say, it runs into issues > with contention on locks for databases being updated. > > > > I know other people have successfully implemented public-facing web sites > with BaseX so I’m curious how they’ve done it—is the life cycle of their > content such that updates are not much of an issue or are they doing > something different? Am I missing some way to make a single BaseX server > take advantage of all available cores? I understood a Java JVM as using a > single core, but maybe my understanding is wrong? > > > > It may be that BaseX as I’m using it is not the right way to do what I’m > doing. For example, it might make more sense to implement the web site > using a typical node.js and React system that then uses BaseX exclusively > through a REST API. That still presents the problem of how to scale > handling of queries but avoids any issues with the web site itself being > responsive. My team is learning how to use node.js, next.js, and React for > other projects so it’s something we could explore. > > > > I could also explore using other database solutions for some or all of > what I want to do. For example, maybe it makes more sense to put my > where-used table into a key-value store (even Solr could work for this > pretty easily) or a SQL database and reserve BaseX for doing the XML-aware > data processing needed to construct the table and doing other XML- and > text-aware queries. But that would still run into performance issues, where > I’m looking for 10ms response times for doing lookups in the where-used > table. > > > > Or maybe I just need to do more caching of query results where the results > are stable for a given content set. > > > > I started this project without any particular plan and got a long way just > building it as I went but now that I’m tasked with fixing a number of > design and behavior issues with my initial approach, I need to make sure I > really know what I’m doing and make the most appropriate implementation > choices. > > > > Thanks, > > > > Eliot > > _____________________________________________ > > *Eliot Kimber* > > Sr. Staff Content Engineer > > O: 512 554 9368 > > > > *servicenow* > > > > servicenow.com <https://www.servicenow.com> > > LinkedIn <https://www.linkedin.com/company/servicenow> | X > <https://twitter.com/servicenow> | YouTube > <https://www.youtube.com/user/servicenowinc> | Instagram > <https://www.instagram.com/servicenow> > > > > *From: *Christian Grün <christian.gr...@gmail.com> > *Date: *Thursday, December 12, 2024 at 5:11 AM > *To: *Eliot Kimber <eliot.kim...@servicenow.com> > *Cc: *basex-talk@mailman.uni-konstanz.de < > basex-talk@mailman.uni-konstanz.de> > *Subject: *Re: [basex-talk] Deeper discussion of BaseX client/server and > web app implementation? > *[External Email]* > > > ------------------------------ > > Hi Eliot, > > > > Free time is a rare resource nowadays; just some quick feedback: > > > > > I’ve done a read through of the current documentation at > https://docs.basex.org/ and also reviewed what I could find online and > such. In the documentation I find a number of references to the > “client/server” architecture but I’m not finding any particularly deep > discussion of it, either in the docs or by searching on i.e., “basex client > server”. > > > > The best entry point may be Getting Started → Database Server [1]. > > > > > When I started my Mirabel project I understood that the way to get > concurrency was to use multiple BaseX HTTP instances, which can make > concurrent read requests on a single set of databases. > > > > That’s dangerous (and has always been problematic). If you use have > concurrent operations, you should have one central HTTP instance. > Otherwise, you might run into concurrency issues and locked databases, as > multiple JVMs cannot share their information among each other [2]. > > > > It may be difficult to give profound answers on the remaining questions in > a few lines. Maybe others can share their experiences. > > > > Best, > > Christian > > > > [1] https://docs.basex.org/12/Getting_Started > > [2] https://docs.basex.org/main/Startup#concurrent_operations > > > > > > -- > > > > Tamara Marnell > > Program Manager, Systems > > Orbis Cascade Alliance (orbiscascade.org <https://www.orbiscascade.org/>) > > Pronouns: she/her/hers >