>On 14 October 2010 16:28, Ben Kloosterman <[email protected]> wrote: >> >> >If I really want an indexed UTF-8 string, I say that, and if I >> >want a rope, I also have to tell you that. >> >> I do favour pushing this to the user when its necessary but shouldnt >the >> default be reasonably fast and memory efficient ? Especially when most >of >> those strings are just storage and not being used /worked on. > >I'm not really sure how true that is. I can only speak from my own >experiences, which are admittedly pretty limited. Most string data I >see are: > >0. Attributes of domain objects for display, or the results of >applying simple functions to them. Peoples names, for example. I >wouldn't have snarfed these out of the database if I wasn't using >them. >1. XML messages (say, in a JMX environment). >2. HTML / XML templates. >3. Symbols. > >These things either seem to be processed in a stream-like manner >(templates) or are manipulated and searched frequently.
Because ints are small while 20-30% of fields maybe string the fact they are much longer results in them being responsible for the most usage. Gui processing especially has lots of strings in the code objects ( though data can be graphic heavy) . Where I see strings is in domain objects , name , address , company , notes , company name , serial numbers , Skus , settings etc etc . Also obviously DSL , XML, HTML and Soap. IMHO the question you should ask is where dont you see strings ? While Dsl ,XML and Soap are short lived they do have a high overhead especially on slow memory embedded devices . In addition Html pages stay in memory quite long for the client either direct or via cache... in fact my firefox browse seems to be my most memory hungry client app. >So you are saying you (who may be the implementer of the string >module) could convert to a specific representation where needed in >string-heavy functions? I think we are starting to agree here. Yes..The char array could be used for - Legacy work requiring mutability - Legacy work requiring pointers that are easier to port to an array - Algorithms where you cant use a byte index ( or its too hard to convert) - Cases where arrays if fixed width chars give higher performance AND performance of that algorithm is crucial. >If it is a problem, when it is a problem, when the app happens to be >written in BitC, it would be nice to change the representation as >needed. I think the verdict is still out on whether UTF-8 is yet a >nice default, though. It certainly makes life more difficult for >people who want to implement VMs in BitC, most of which imply O(1) >indexing of their native string type, and having to always convert to >use native string functions would be awkward. I would like most of the lib to be O(1) via byte indexing . The usage of arrays if needed would only really apply to a few developers I mean I rarely use indexes these days and when I do the code is fragile , higher language constructs and functions have mostly replaced it. > >Do consider that most business applications are already written in >Java or C#, and that Hotspot, for example, already incurs extra >overhead to store extents on the string and room for a lock for >synchronisation. People seem not to care a whole lot about small order >of magnitude size increases for extra functionality. It's far cheaper >to buy another server than to write C. A x86 desktop centric view , try running on something with 32 Meg of memory or less. Im sure a UTF-8 native storage with backward compatibility will not occur any development overhead in fact it will be less since we specifically have a system for all that old array /pointer based code that may even require mutability and personally I would just wrap the existing C , but new programs will benefit . Ben _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
