This change sounds very welcome. I think we've all experienced the frustration 
of case-insensitive searching.

One suggestion I have for the man page would be a beginner friendly example of 
iterating through the contents of an ns_set e.g.

set headers [ns_conn headers]
set output ""
foreach {key value} [ns_set array $headers] {
  append output "$key = $value \n"
}
set output

I know we have ns_set print, but the first time I encountered sets an example 
like the above would have been helpful.

All the best
Brian
________________________________
From: Gustaf Neumann (sslmail) <neum...@wu.ac.at>
Sent: Thursday 7 November 2024 10:13 am
To: naviserver-devel@lists.sourceforge.net 
<naviserver-devel@lists.sourceforge.net>
Subject: [naviserver-devel] ns_set reform

Dear all,

The upcoming release of NaviServer 5 includes several enhancements to "ns_set", 
such as more efficient storage management and the addition of the "-all" option 
for "ns_set get" and "ns_set iget". I also plan to implement further 
improvements.

One common problem area is the handling of "ns_set" collections with 
case-insensitive keys (e.g., HTTP request and reply headers, or configuration 
values). Currently, developers must use the i* variants of commands whenever 
accessing or updating elements with case-insensitive keys.

This current approach has two main issues:

a) Semantic Issues: When a developer forgets to use the i* variant for such 
sets errors will occur that are difficult to debug. These errors may happen at 
both the C level and the Tcl level, affecting the main NaviServer code as well 
as the NaviServer modules. In the past few days, I have fixed more than ten 
such cases, which is why I am now addressing this issue.

b) Performance Issues: Using the i* variants (e.g., "ns_set iget”) leads to 
keys being searched sequentially using strcasecmp(). For a set with N elements, 
an average of N/2 strcasecmp() operations are performed per lookup or update 
operation. In other words, during each get/update/delete operation, 
approximately half of the dictionary is converted to lowercase. Additionally, 
the input key is repeatedly converted to lowercase. Benchmark comparisons 
indicate that strcasecmp() is 20 to 30 times slower than strcmp(). The heavy 
reliance on strcasecmp() in the implementation of Ns_Set certainly impacts 
performance.

To address these problems, I have added a property to the Ns_Set structure that 
allows it to be declared case-insensitive. With this property enabled, keys are 
stored in lowercase, enabling the use of
strcmp() instead of strcasemp() for comparisons. This change not only improves 
performance but also reduces the likelihood of semantic errors, as developers 
no longer need to remember to use the i* variants.

So far, I have made the configuration value sets and HTTP header sets 
case-insensitive:

 - HTTP Headers: According to the RFC, the names of header fields are 
case-insensitive. Now, these headers are stored in lowercase, ensuring 
consistent behavior.

 - Configuration Sets: Most operations were already case-insensitive,  but a 
few were not, which I regard as bugs.

The Ns_Sets for both cases are created from the C level. The only behavioral 
change is that when obtaining all keys from such a set, they are now returned 
in lowercase. For case-insensitive sets, the operations "ns_set get" and 
"ns_set iget will" return the same results.

In future versions, we might consider other options for ns_sets, such as 
enforcing uniqueness or sorting. However, I do not regard these as important as 
the "-nocase" option for now.

Some preliminary data: For a single HTTP request in OpenACS, I count 1,672 
strcasecmp() operations. By utilizing the "-nocase" options on HTTP headers and 
ns_config data, this number is reduced for the same request to just 195 
operations per request. On some of our busier systems that process up to 1,000 
requests per second, the nocase option on sets saves nearly 1.5 million 
strcasecmp() operations per second.

I am also considering adding a "-nocase" flag to the base operations (i.e. 
allowing "ns_set get -nocase ..." in addition to "ns_set iget ...").  The 
"-nocase" flag is common in the Tcl world and appears more familiar than using 
strange names like: "icput", "idelkey", "ifind”, "iget", "imerge", "iunique", 
and "iupdate",

Finally, the man page for "ns_set" is not very user-friendly, as the base case 
and the "-nocase" variant should be presented together and not require 
excessive scrolling. The i* variants will continue to work and might be marked 
as deprecated in the future.

As always, your input is welcome. If you have suggestions about maybe more 
annoying legacy operations, please let me know, the release change is a chance 
to address such things.

All the best
-g

_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to