This change sounds very welcome. I think we've all experienced the frustration
of case-insensitive searching.
One suggestion I have for the man page would be a beginner friendly example of
iterating through the contents of an ns_set e.g.
set headers [ns_conn headers]
set output ""
foreach {key value} [ns_set array $headers] {
append output "$key = $value \n"
}
set output
I know we have ns_set print, but the first time I encountered sets an example
like the above would have been helpful.
All the best
Brian
________________________________
From: Gustaf Neumann (sslmail) <[email protected]>
Sent: Thursday 7 November 2024 10:13 am
To: [email protected]
<[email protected]>
Subject: [naviserver-devel] ns_set reform
Dear all,
The upcoming release of NaviServer 5 includes several enhancements to "ns_set",
such as more efficient storage management and the addition of the "-all" option
for "ns_set get" and "ns_set iget". I also plan to implement further
improvements.
One common problem area is the handling of "ns_set" collections with
case-insensitive keys (e.g., HTTP request and reply headers, or configuration
values). Currently, developers must use the i* variants of commands whenever
accessing or updating elements with case-insensitive keys.
This current approach has two main issues:
a) Semantic Issues: When a developer forgets to use the i* variant for such
sets errors will occur that are difficult to debug. These errors may happen at
both the C level and the Tcl level, affecting the main NaviServer code as well
as the NaviServer modules. In the past few days, I have fixed more than ten
such cases, which is why I am now addressing this issue.
b) Performance Issues: Using the i* variants (e.g., "ns_set iget”) leads to
keys being searched sequentially using strcasecmp(). For a set with N elements,
an average of N/2 strcasecmp() operations are performed per lookup or update
operation. In other words, during each get/update/delete operation,
approximately half of the dictionary is converted to lowercase. Additionally,
the input key is repeatedly converted to lowercase. Benchmark comparisons
indicate that strcasecmp() is 20 to 30 times slower than strcmp(). The heavy
reliance on strcasecmp() in the implementation of Ns_Set certainly impacts
performance.
To address these problems, I have added a property to the Ns_Set structure that
allows it to be declared case-insensitive. With this property enabled, keys are
stored in lowercase, enabling the use of
strcmp() instead of strcasemp() for comparisons. This change not only improves
performance but also reduces the likelihood of semantic errors, as developers
no longer need to remember to use the i* variants.
So far, I have made the configuration value sets and HTTP header sets
case-insensitive:
- HTTP Headers: According to the RFC, the names of header fields are
case-insensitive. Now, these headers are stored in lowercase, ensuring
consistent behavior.
- Configuration Sets: Most operations were already case-insensitive, but a
few were not, which I regard as bugs.
The Ns_Sets for both cases are created from the C level. The only behavioral
change is that when obtaining all keys from such a set, they are now returned
in lowercase. For case-insensitive sets, the operations "ns_set get" and
"ns_set iget will" return the same results.
In future versions, we might consider other options for ns_sets, such as
enforcing uniqueness or sorting. However, I do not regard these as important as
the "-nocase" option for now.
Some preliminary data: For a single HTTP request in OpenACS, I count 1,672
strcasecmp() operations. By utilizing the "-nocase" options on HTTP headers and
ns_config data, this number is reduced for the same request to just 195
operations per request. On some of our busier systems that process up to 1,000
requests per second, the nocase option on sets saves nearly 1.5 million
strcasecmp() operations per second.
I am also considering adding a "-nocase" flag to the base operations (i.e.
allowing "ns_set get -nocase ..." in addition to "ns_set iget ..."). The
"-nocase" flag is common in the Tcl world and appears more familiar than using
strange names like: "icput", "idelkey", "ifind”, "iget", "imerge", "iunique",
and "iupdate",
Finally, the man page for "ns_set" is not very user-friendly, as the base case
and the "-nocase" variant should be presented together and not require
excessive scrolling. The i* variants will continue to work and might be marked
as deprecated in the future.
As always, your input is welcome. If you have suggestions about maybe more
annoying legacy operations, please let me know, the release change is a chance
to address such things.
All the best
-g
_______________________________________________
naviserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/naviserver-devel
_______________________________________________
naviserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/naviserver-devel