Hi Eliot,

A short reply, as so often: It is definitely an option to work with
multiple BaseX instances and use one of them for delegating incoming
requests. Other BaseX instances can e.g. be addressed with the Client
Module [1]. If you have millions of requests per day, it can get
recommendable to either disable logging (in case you have a proxy layer
anyway) or use the recently added log filter to reduce the number of
entries [2].

Hope this helps,
Christian

[1] https://docs.basex.org/main/Client_Functions
[2] https://docs.basex.org/12/Options#logexclude


Eliot Kimber via BaseX-Talk <basex-talk@mailman.uni-konstanz.de> schrieb am
Do., 23. Jan. 2025, 20:55:

> I wanted to revisit this discussion in the new year.
>
> In the context of a different internal initiative, I’ve been learning more
> about traditional web sites architecture and implementation (i.e., apache
> httpd plus statically generated sites using node.js).
>
> This has gotten me to wondering whether the general architecture for a
> large-user-count, long-query-handling web site is to have one BaseX HTTP
> server to serve the web site and handle requests and a second server
> (basexserver) that does all the query work and accepts requests from the
> HTTP server?
>
>
>
> That’s based on the assumption that a basexserver instance can handle a
> large number of concurrent requests as it must handle its own internal
> threading etc. This also presumes that a single server instance (one JVM)
> can use multiple cores on a multi-core server (my production server is an
> 8-CPU server). A little reading on jetty suggests that it should be able to
> handle the concurrent load I’m likely to have with no problem.
>
>
>
> This would still require a mechanism to manage long-lived HTTP requests
> from the client, but there are various ways to handle that, including using
> web sockets to alert a client that a long request has completed and be more
> sophisticated with HTTP request details, as well as storing query results
> in a cache location from they are served back to the client.
>
>
>
> It seems like my architectural mistake is having a BaseX HTTP server that
> both serves the interactive web site and makes queries and then trying to
> scale horizontally by having multiple HTTP servers to which requests are
> delegated by the primary server.
>
>
>
> By having a single baseserver instance handling all queries, it can manage
> read and write locking appropriately.
>
>
>
> Is this an appropriate architecture?
>
>
>
> Thanks,
>
>
>
> Eliot
>
> _____________________________________________
>
> *Eliot Kimber*
>
> Sr. Staff Content Engineer
>
> O: 512 554 9368
>
>
>
> *servicenow*
>
>
>
> servicenow.com <https://www.servicenow.com>
>
> LinkedIn <https://www.linkedin.com/company/servicenow> | X
> <https://twitter.com/servicenow> | YouTube
> <https://www.youtube.com/user/servicenowinc> | Instagram
> <https://www.instagram.com/servicenow>
>
>
>
> *From: *Eliot Kimber <eliot.kim...@servicenow.com>
> *Date: *Thursday, December 12, 2024 at 4:40 PM
> *To: *basex-talk@mailman.uni-konstanz.de <
> basex-talk@mailman.uni-konstanz.de>
> *Subject: *Re: [basex-talk] Deeper discussion of BaseX client/server and
> web app implementation?
>
> In my approach, all updates are done to databases that are only used by my
> data loading HTTP server.
>
> Basically, I create a set of temporary databases, load the new data into
> those, then, like Tamara, swap out the production (read-only) databases
> with the newly-created temp databases.
>
> I’ve currently implemented this by doing:
>
> 1. Rename the production database to something unique (i.e.,
> “_dropme_databasename”)
>
> 2. Rename the temp database to the production name
>
> 3. Drop what was the production database.
>
>
>
> This is in the context of my multi-server approach, where all the updates
> of a given set of related databases are done by a single HTTP server (and
> thus a single JVM).
>
>
>
> Cheers,
>
>
>
> E.
>
>
>
> _____________________________________________
>
> *Eliot Kimber*
>
> Sr. Staff Content Engineer
>
> O: 512 554 9368
>
>
>
> *servicenow*
>
>
>
> servicenow.com <https://www.servicenow.com>
>
> LinkedIn <https://www.linkedin.com/company/servicenow> | X
> <https://twitter.com/servicenow> | YouTube
> <https://www.youtube.com/user/servicenowinc> | Instagram
> <https://www.instagram.com/servicenow>
>
>
>
> *From: *Lizzi, Vincent <vincent.li...@taylorandfrancis.com>
> *Date: *Thursday, December 12, 2024 at 12:59 PM
> *To: *Tamara Marnell <tmarn...@orbiscascade.org>, Eliot Kimber <
> eliot.kim...@servicenow.com>
> *Cc: *basex-talk@mailman.uni-konstanz.de <
> basex-talk@mailman.uni-konstanz.de>
> *Subject: *RE: [basex-talk] Deeper discussion of BaseX client/server and
> web app implementation?
> *[External Email]*
>
>
> ------------------------------
>
> Hello Eliot and Tamara,
>
>
>
> I’ve observed what appears to be – though haven’t fully tested to isolate
> and confirm this – instances where a write operation such as db:create()
> blocks BaseX from serving other http requests -- which use db:list() and
> db:get() --  until the write operation is finished.
>
>
>
> On reading the description of lock detection here
> https://docs.basex.org/main/BaseX_10#compilation I’m now wondering if it
> might help to apply a naming convention to database names so that it’s
> possible to distinguish by name which databases are currently used for read
> vs write – although renaming databases might add other complexities.
>
>
>
> Thanks,
>
> Vincent
>
>
>
>
>
> _____________________________________________
>
> *Vincent M. Lizzi*
>
> Head of Information Standards | Taylor & Francis Group
>
> vincent.li...@taylorandfrancis.com
>
>
>
>
>
> Information Classification: General
>
> *From:* BaseX-Talk <basex-talk-boun...@mailman.uni-konstanz.de> *On
> Behalf Of *Tamara Marnell
> *Sent:* Thursday, December 12, 2024 12:56 PM
> *To:* Eliot Kimber <eliot.kim...@servicenow.com>
> *Cc:* basex-talk@mailman.uni-konstanz.de
> *Subject:* Re: [basex-talk] Deeper discussion of BaseX client/server and
> web app implementation?
>
>
>
> Hello Eliot,
>
>
>
> I have only one BaseX instance, but to avoid the locking issue during
> large updates/optimizations, I have multiple copies of the databases.
> Updates are performed on "working" databases, and then I use db:copy to
> duplicate them to "production" databases for users on the front end to
> query. I haven't seen or heard of any problems with concurrent users on the
> public side when they're just reading from the production databases.
>
>
>
> -Tamara
>
>
>
> On Thu, Dec 12, 2024 at 6:53 AM Eliot Kimber <eliot.kim...@servicenow.com>
> wrote:
>
> I fully understand the issue of time.
>
>
>
> The Database Server page (https://docs.basex.org/12/Database_Server)
> doesn’t really provide the details I’m looking for.
>
> In particular, it’s not clear to me how a BaseX server would be used with
> an HTTP server in order to manage parallel query execution and ensure a
> responsive web site in the face of 100s of concurrent web users making
> 1000s of query requests. My current architecture handles this in terms of
> responsiveness and horizontal scaling, but as you say, it runs into issues
> with contention on locks for databases being updated.
>
>
>
> I know other people have successfully implemented public-facing web sites
> with BaseX so I’m curious how they’ve done it—is the life cycle of their
> content such that updates are not much of an issue or are they doing
> something different? Am I missing some way to make a single BaseX server
> take advantage of all available cores? I understood a Java JVM as using a
> single core, but maybe my understanding is wrong?
>
>
>
> It may be that BaseX as I’m using it is not the right way to do what I’m
> doing. For example, it might make more sense to implement the web site
> using a typical node.js and React system that then uses BaseX exclusively
> through a REST API. That still presents the problem of how to scale
> handling of queries but avoids any issues with the web site itself being
> responsive. My team is learning how to use node.js, next.js, and React for
> other projects so it’s something we could explore.
>
>
>
> I could also explore using other database solutions for some or all of
> what I want to do. For example, maybe it makes more sense to put my
> where-used table into a key-value store (even Solr could work for this
> pretty easily) or a SQL database and reserve BaseX for doing the XML-aware
> data processing needed to construct the table and doing other XML- and
> text-aware queries. But that would still run into performance issues, where
> I’m looking for 10ms response times for doing lookups in the where-used
> table.
>
>
>
> Or maybe I just need to do more caching of query results where the results
> are stable for a given content set.
>
>
>
> I started this project without any particular plan and got a long way just
> building it as I went but now that I’m tasked with fixing a number of
> design and behavior issues with my initial approach, I need to make sure I
> really know what I’m doing and make the most appropriate implementation
> choices.
>
>
>
> Thanks,
>
>
>
> Eliot
>
> _____________________________________________
>
> *Eliot Kimber*
>
> Sr. Staff Content Engineer
>
> O: 512 554 9368
>
>
>
> *servicenow*
>
>
>
> servicenow.com <https://www.servicenow.com>
>
> LinkedIn <https://www.linkedin.com/company/servicenow> | X
> <https://twitter.com/servicenow> | YouTube
> <https://www.youtube.com/user/servicenowinc> | Instagram
> <https://www.instagram.com/servicenow>
>
>
>
> *From: *Christian Grün <christian.gr...@gmail.com>
> *Date: *Thursday, December 12, 2024 at 5:11 AM
> *To: *Eliot Kimber <eliot.kim...@servicenow.com>
> *Cc: *basex-talk@mailman.uni-konstanz.de <
> basex-talk@mailman.uni-konstanz.de>
> *Subject: *Re: [basex-talk] Deeper discussion of BaseX client/server and
> web app implementation?
> *[External Email]*
>
>
> ------------------------------
>
> Hi Eliot,
>
>
>
> Free time is a rare resource nowadays; just some quick feedback:
>
>
>
> > I’ve done a read through of the current documentation at
> https://docs.basex.org/ and also reviewed what I could find online and
> such. In the documentation I find a number of references to the
> “client/server” architecture but I’m not finding any particularly deep
> discussion of it, either in the docs or by searching on i.e., “basex client
> server”.
>
>
>
> The best entry point may be Getting Started → Database Server [1].
>
>
>
> > When I started my Mirabel project I understood that the way to get
> concurrency was to use multiple BaseX HTTP instances, which can make
> concurrent read requests on a single set of databases.
>
>
>
> That’s dangerous (and has always been problematic). If you use have
> concurrent operations, you should have one central HTTP instance.
> Otherwise, you might run into concurrency issues and locked databases, as
> multiple JVMs cannot share their information among each other [2].
>
>
>
> It may be difficult to give profound answers on the remaining questions in
> a few lines. Maybe others can share their experiences.
>
>
>
> Best,
>
> Christian
>
>
>
> [1] https://docs.basex.org/12/Getting_Started
>
> [2] https://docs.basex.org/main/Startup#concurrent_operations
>
>
>
>
>
> --
>
>
>
> Tamara Marnell
>
> Program Manager, Systems
>
> Orbis Cascade Alliance (orbiscascade.org <https://www.orbiscascade.org/>)
>
> Pronouns: she/her/hers
>

Reply via email to