Re: Strings Proposal

Kyle Brady Mon, 02 Mar 2015 16:29:31 -0800

Hi John,

> [...] will u get value of 9 or 57 (ASCII value of 9)?


57, or whatever value is actually stored in memory.

> Will there be a byte type to accompany bytes type?

I think uint(8) will be fine for this role.

> This might be a stupid question, but why would isDigit, isNumeric and
>isDecimal only work for unicode?

isDigit was incorrectly bundled into that group and will work on bytes.
These functions will query the corresponding unicode property of the same
name for that character, see

http://en.wikipedia.org/wiki/Numerals_in_Unicode#Numerals_by_numeric_proper
ty

for an explanation of the differences. One thing that is slightly
confusing is that isDigit on bytes is closer to isDecimal for unicode.
But, isDigit is the function name used in C, so I've used it for
familiarity.

> Could it be useful to parametrize the find functions [...]?

It might be, I'm not a huge fan of exposing different implementations of
core routines to users though. My current plan was to use
Boyer-Moore-Horspool, falling back to brute-force for short strings.

> Would tighter integration of regular expressions be doable?

The current Regexp module does define some functions on string, but
doesn't overload. I'm for more integration, but I've heard from some
others that they liked having a difference between string.find and
string.search. I think it would be worth giving the Regexp interface
another look after some of the strings work is completed. Nothing stops us
from adding (or not adding) those overloads later.

> Is the string/bytes internally stored as a dynamic array, ie. does every
>append operation result in new memory allocation or not?

The plan is that operations should use available space when possible.

> If the backing storage is dynamic, would it be feasible to also
>facilitate prepend operations which wouldn't necessitate a new allocation?

This should be doable for some operations, trim would be easy for example.
It probably won't be anything I worry about on the first pass of getting
things working though.

Thanks for the comments,

-Kyle

On 3/2/15, 12:33 PM, "John MacFrenz" <[email protected]> wrote:

>Hi,
>
>Here are some comments I have on the proposal, in no particular order.
>
>
> - Casting bytes to integer types. For example, in following code:
>
>const b: bytes = b"9";
>const u = b:uint(8);
>
>will u get value of 9 or 57 (ASCII value of 9)? If former, how could user
>choose the latter cast? Will there be a byte type to accompany bytes type?
>
>
> - This might be a stupid question, but why would isDigit, isNumeric and
>isDecimal only work for unicode?
>
>
> - Could it be useful to parametrize the find functions so that multiple
>different find algorithms could be implemented, and user could choose
>which one to use? Like:
>
>enum FindAlgorithm { BruteForce, Boyer_Moore };
>
>proc string.find(needle: string, start: int = 1, end: int = 0, algorithm:
>FindAlgorithm = FindAlgorithm.Boyer_Moore ) : int {
>    if boundaryChecks then {
>        ...
>    }
>    _find(needle, start, end, algorithm);
>}
>
>proc string._find(needle: string, start: int, end: int, param algorithm:
>FindAlgorithm) : int where algorithm ==  FindAlgorithm.Boyer_Moore {
>    ...
>}
>
>
> - Would tighter integration of regular expressions be doable? Like
>overloading replace, find etc. functions with variants using regular
>expressions.
>
>
> - Is the string/bytes internally stored as a dynamic array, ie. does
>every append operation result in new memory allocation or not?
> --> If the backing storage is dynamic, would it be feasible to also
>facilitate prepend operations which wouldn't necessitate a new
>allocation? And maybe in-place trim, replace etc..
>
>
>
>
>26.02.2015, 01:23, "Kyle Brady" <[email protected]>:
>>   Hi Chapel Users,
>>
>>   This is our plan for the future of strings in Chapel from a semantic
>>and
>>   library point of view:
>>
>>   https://gist.github.com/Kyle-B/44138a37907a2a53c4b2
>>
>>   I still have a few things marked as TODO or not filled in, but the
>>   important details are there.
>>
>>   This document was originally written a few months ago and the team
>>met to
>>   talk about it. I think most people are reasonably happy with where it
>>   stands now.
>>
>>   A few of these changes may make it into 1.11, but the vast majority
>>will
>>   be in a later release. If you have any comments or concerns I'd be
>>glad to
>>   hear them - especially if you use non-ASCII character sets routinely.
>>
>>   Thanks,
>>
>>   -Kyle
>>
>>   
>>-------------------------------------------------------------------------
>>-----
>>   Dive into the World of Parallel Programming The Go Parallel Website,
>>sponsored
>>   by Intel and developed in partnership with Slashdot Media, is your
>>hub for all
>>   things parallel software development, from weekly thought leadership
>>blogs to
>>   news, videos, case studies, tutorials and more. Take a look and join
>>the
>>   conversation now. http://goparallel.sourceforge.net/
>>   _______________________________________________
>>   Chapel-users mailing list
>>   [email protected]
>>   https://lists.sourceforge.net/lists/listinfo/chapel-users


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Re: Strings Proposal

Reply via email to