This change sounds very welcome. I think we've all experienced the frustration of case-insensitive searching.
One suggestion I have for the man page would be a beginner friendly example of iterating through the contents of an ns_set e.g. set headers [ns_conn headers] set output "" foreach {key value} [ns_set array $headers] { append output "$key = $value \n" } set output I know we have ns_set print, but the first time I encountered sets an example like the above would have been helpful. All the best Brian ________________________________ From: Gustaf Neumann (sslmail) <neum...@wu.ac.at> Sent: Thursday 7 November 2024 10:13 am To: naviserver-devel@lists.sourceforge.net <naviserver-devel@lists.sourceforge.net> Subject: [naviserver-devel] ns_set reform Dear all, The upcoming release of NaviServer 5 includes several enhancements to "ns_set", such as more efficient storage management and the addition of the "-all" option for "ns_set get" and "ns_set iget". I also plan to implement further improvements. One common problem area is the handling of "ns_set" collections with case-insensitive keys (e.g., HTTP request and reply headers, or configuration values). Currently, developers must use the i* variants of commands whenever accessing or updating elements with case-insensitive keys. This current approach has two main issues: a) Semantic Issues: When a developer forgets to use the i* variant for such sets errors will occur that are difficult to debug. These errors may happen at both the C level and the Tcl level, affecting the main NaviServer code as well as the NaviServer modules. In the past few days, I have fixed more than ten such cases, which is why I am now addressing this issue. b) Performance Issues: Using the i* variants (e.g., "ns_set iget”) leads to keys being searched sequentially using strcasecmp(). For a set with N elements, an average of N/2 strcasecmp() operations are performed per lookup or update operation. In other words, during each get/update/delete operation, approximately half of the dictionary is converted to lowercase. Additionally, the input key is repeatedly converted to lowercase. Benchmark comparisons indicate that strcasecmp() is 20 to 30 times slower than strcmp(). The heavy reliance on strcasecmp() in the implementation of Ns_Set certainly impacts performance. To address these problems, I have added a property to the Ns_Set structure that allows it to be declared case-insensitive. With this property enabled, keys are stored in lowercase, enabling the use of strcmp() instead of strcasemp() for comparisons. This change not only improves performance but also reduces the likelihood of semantic errors, as developers no longer need to remember to use the i* variants. So far, I have made the configuration value sets and HTTP header sets case-insensitive: - HTTP Headers: According to the RFC, the names of header fields are case-insensitive. Now, these headers are stored in lowercase, ensuring consistent behavior. - Configuration Sets: Most operations were already case-insensitive, but a few were not, which I regard as bugs. The Ns_Sets for both cases are created from the C level. The only behavioral change is that when obtaining all keys from such a set, they are now returned in lowercase. For case-insensitive sets, the operations "ns_set get" and "ns_set iget will" return the same results. In future versions, we might consider other options for ns_sets, such as enforcing uniqueness or sorting. However, I do not regard these as important as the "-nocase" option for now. Some preliminary data: For a single HTTP request in OpenACS, I count 1,672 strcasecmp() operations. By utilizing the "-nocase" options on HTTP headers and ns_config data, this number is reduced for the same request to just 195 operations per request. On some of our busier systems that process up to 1,000 requests per second, the nocase option on sets saves nearly 1.5 million strcasecmp() operations per second. I am also considering adding a "-nocase" flag to the base operations (i.e. allowing "ns_set get -nocase ..." in addition to "ns_set iget ..."). The "-nocase" flag is common in the Tcl world and appears more familiar than using strange names like: "icput", "idelkey", "ifind”, "iget", "imerge", "iunique", and "iupdate", Finally, the man page for "ns_set" is not very user-friendly, as the base case and the "-nocase" variant should be presented together and not require excessive scrolling. The i* variants will continue to work and might be marked as deprecated in the future. As always, your input is welcome. If you have suggestions about maybe more annoying legacy operations, please let me know, the release change is a chance to address such things. All the best -g _______________________________________________ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
_______________________________________________ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel