>On 14 October 2010 16:28, Ben Kloosterman <[email protected]> wrote:
 >>
 >>  >If I really want an indexed UTF-8 string, I say that, and if I
 >>  >want a rope, I also have to tell you that.
 >>
 >> I do favour pushing this to the user  when its necessary but shouldn’t
 >the
 >> default be reasonably fast and memory efficient ? Especially when most
 >of
 >> those strings are just storage and not being used /worked on.
 >
 >I'm not really sure how true that is. I can only speak from my own
 >experiences, which are admittedly pretty limited. Most string data I
 >see are:
 >
 >0. Attributes of domain objects for display, or the results of
 >applying simple functions to them. Peoples names, for example. I
 >wouldn't have snarfed these out of the database if I wasn't using
 >them.
 >1. XML messages (say, in a JMX environment).
 >2. HTML / XML templates.
 >3. Symbols.
 >
 >These things either seem to be processed in a stream-like manner
 >(templates) or are manipulated and searched frequently.

Because ints are small while 20-30% of fields maybe string the fact they are
much longer results in them being responsible for the most usage. Gui
processing especially has lots of strings in the code objects ( though data
can be graphic  heavy) .

Where I  see strings is in domain objects ,  name , address , company ,
notes , company name , serial numbers , Skus , settings etc etc .  Also
obviously DSL ,  XML, HTML and Soap. IMHO the question you should ask is
where don’t you see strings ? While Dsl ,XML and Soap are short lived they
do have a high overhead especially on slow memory embedded devices . In
addition Html pages stay in memory quite long for the client either direct
or via cache... in fact my firefox browse seems to be my most memory hungry
client app. 



 >So you are saying you (who may be the implementer of the string
 >module) could convert to a specific representation where needed in
 >string-heavy functions? I think we are starting to agree here.

Yes..The char array could be used for 
- Legacy  work requiring mutability 
- Legacy work requiring pointers that are easier to port to an array 
- Algorithms where you cant use a byte index  ( or its too hard to convert) 
- Cases where arrays if fixed width chars give higher performance AND
performance of that algorithm is crucial.

 >If it is a problem, when it is a problem, when the app happens to be
 >written in BitC, it would be nice to change the representation as
 >needed. I think the verdict is still out on whether UTF-8 is yet a
 >nice default, though. It certainly makes life more difficult for
 >people who want to implement VMs in BitC, most of which imply O(1)
 >indexing of their native string type, and having to always convert to
 >use native string functions would be awkward.

I would like most of the lib to be O(1)  via byte indexing .  The usage of
arrays if needed would only really apply to a few developers I mean I rarely
use indexes these days and when I do the code is fragile , higher language
constructs and functions have mostly replaced it.

 >
 >Do consider that most business applications are already written in
 >Java or C#, and that Hotspot, for example, already incurs extra
 >overhead to store extents on the string and room for a lock for
 >synchronisation. People seem not to care a whole lot about small order
 >of magnitude size increases for extra functionality. It's far cheaper
 >to buy another server than to write C.

A x86 desktop centric view , try running on something with 32 Meg of memory
or less. Im sure a UTF-8 native storage with backward compatibility will not
occur any development overhead in fact it will be less since we specifically
have a system for all that old array /pointer based code that may even
require mutability and personally I would just wrap the existing C  ,  but
new programs will benefit . 




Ben 


_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to