Dear all,

over the last days, I've worked on improving scalability of ns_set operations in NaviServer, which are used on multiple crucial places such as:

 - ndsb interface (returning tuples as ns_sets)
 - configuration values
 - headers

The classical implementation for ns_sets uses separately malloced storage for every attribute name and attribute value. So, e.g., for 1000 ns_sets with 20 members each, this means 1,000*20*2 = 40,000 malloc/free operations, e.g. for a single db query! Although the malloc implementations have improved, these will require many lock operations, especially under load, where many threads might do as much malloc operation. One other consequence is that the allocated memory will be scattered over address space, which has bad implications for CPU caching.

The new implementation uses a single Tcl_DString per ns_set keeping all attribute names and attribute values. This reduces the malloc operations and improved memory locality, such that cache hits will improve.

One caveat is that modules using ns_set have to be recompiled, since the full data structure of the ns_set is exposed, and adding a member causes a binary incompatibility. One other potential problem is that C-level modules using the Ns_Set* API have to make sure that long-living string values are copied. Copying was necessary in general before as well, but might show up now earlier, when the ns_set sees multiple updates. There was one place, inside NaviServer, where a change was necessary. All of OpenACS is working fine with these changes, openacs.org runs already a version having this feature enabled.

However, since this is technically a large change, the initial commit will have this feature deactivated by default (flag NS_SET_DSTRING). My plan is to have this feature by default deactivated for the 4.99* releases, but having it enabled for version 5 of NaviServer. Other goals for NS5 are Tcl9 compatibility (will require source code changes; NSF/XOTcl is already Tcl9 compatible) and the move from MPL 1.1 to 2.0 (as previously [1] announced).

Here are some preliminary results from the ns_set reform.  The tests were performed on openacs.org (Xeon Gold 6226R CPU @ 2.90GHz, 32 cores, hyper-threading enabled). The test executes the SQL query

    select * from acs_objects limit 1000

100 times in sequence. This test is run in 1..30 concurrent threads. With 30 threads, 3mio tuples are retrieved, and 72 mio malloc/free operations are needed alone for the retrieved values.

Before (classical ns_set with many mallocs):

    threads 1 total 4606.787 ms avg 3285.25 ms
    threads 5 total 4595.358 ms avg 3493.07 ms
    threads 10 total 4804.193 ms avg 3755.93 ms
    threads 20 total 6279.524 ms avg 4569.16 ms
    threads 30 total 8966.427 ms avg 6618.58 ms

After reform (using common Tcl_DString per tuple):

    threads 1 total 4524.645 ms avg 3242.54 ms
    threads 5 total 4251.266 ms avg 3450.09 ms
    threads 10 total 4656.795 ms avg 3665.31 ms
    threads 20 total 5934.105 ms avg 4671.38 ms
    threads 30 total 7384.591 ms avg 5642.76 ms

As one can see,  the improvement increases under higher load (with more parallel threads). E.g. with 30 threads, the total time improved by 17%.... with a smaller RSS. These tests were not performed under "clinical" conditions.

While working on ns_set and OpenACS db operations, having substantial debugging activated, I've optimized the database operations further, such that on high load and large queries, the performance can be now several times (!) faster by avoiding duplicates in the memory (but this is more an OpenACS topic).

More changes:

Better resource reuse:
- keep Ns_Sets for request and response headers instead of
  deleting/recreating it frequently

API extensions:
- Provide new interface ending with *Sz to provide string sizes.
  This reduces the need of strlen() operations.
  * Ns_SetCreateSz()
  * Ns_SetIUpdateSz()
  * Ns_SetPutSz()
  * Ns_SetPutValueSz()
  * Ns_SetUpdateSz()

- New severity Debug(nsset) for ns_set debugging added, maybe temporary

- New API calls
  * Ns_SetClearValues(): clear the values for all keys
  * Ns_SetDataPrealloc(): creating ns_sets with pre-allocated values
    to avoid resize operations
  * NsSetResize()
  * NsHeaderSetGet()
- Ns_ConfigSet(const char *section, const char *key, const char *name)
  The last argument is now and allows one to created named sets (previously, all
  such sets were unnamed)

- Tcl API:
  * provide optional values to "ns_set size" for pre-allocation

- extended regression test

All the best
-g



_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to