Hi, Thanks very much for your comments. I have tried looking at the suggested code, but it was not easy for me to modify the code... XD (so still not success). Anyway, to get more info, I first simplified the test code as
proc test()
{
var a = "apple orange grape"; // OK
// var a = "apple orange gr"; // OK
// var a = "apple orange g"; // NG
var spl = a.split();
writeln( "a.split() = ", spl );
}
test();
and the second split() routine as follows:
-------------------------------------------------------------------------
# The original code is here:
https://github.com/chapel-lang/chapel/blob/master/modules/internal/String.chpl#L715
# A simplified code (just to print variables):
iter split ( maxsplit: int = -1 ) /* : string */ {
if !this.isEmptyString() {
const localThis: string = this.localize();
var done : bool = false;
var yieldChunk : bool = false;
var chunk : string;
var inChunk : bool = false;
var chunkStart : int;
var iEnd = localThis.len - 1;
//>>>
writeln( ">>> Entering the 2nd split()" );
writeln( "localThis.len = ", localThis.len );
writeln( "for i in 0..#localThis.len = ", 0..#localThis.len );
//<<<
// Loop over chars in the string.
for i in 0..#localThis.len {
var b = localThis.buff[ i ];
var bSpace = byte_isWhitespace( b );
//>>>
writeln( " b= ", b, " bSpace= ", bSpace, " i+1= ", i+1,
" localThis[i+1..]= <", localThis[ i+1.. ], ">" );
//<<<
// first char of a chunk
if ! ( inChunk || bSpace ) {
chunkStart = i + 1; // 0-based buff -> 1-based range
inChunk = true;
//>>>
writeln( " ---> new chunk begins at chunkStart = ",
chunkStart );
//<<<
} else if inChunk {
// first char out of a chunk
if bSpace {
chunk = localThis[ chunkStart..i ];
yieldChunk = true;
inChunk = false;
//>>>
writeln( "yield: chunk = <", chunk, ">" );
//<<<
// out of chars
} else if i == iEnd {
// --- This block seems not reached when the last word
// --- is a single character.
chunk = localThis[ chunkStart.. ];
yieldChunk = true;
done = true;
//>>>
writeln( "yield: chunk(last) = <", chunk, ">" );
//<<<
} // bSpace
} // chunk
if yieldChunk {
yield chunk;
yieldChunk = false;
}
if done then
break;
} // for i
} // if not empty
} // proc
-------------------------------------------------------------------
Then, a = "apple orange gr" gives
>>> Entering the 2nd split()
localThis.len = 17
for i in 0..#localThis.len = 0..16
b= 97 bSpace= false i+1= 1 localThis[i+1..]= <apple orange gr>
---> new chunk begins at chunkStart = 1
b= 112 bSpace= false i+1= 2 localThis[i+1..]= <pple orange gr>
b= 112 bSpace= false i+1= 3 localThis[i+1..]= <ple orange gr>
b= 108 bSpace= false i+1= 4 localThis[i+1..]= <le orange gr>
b= 101 bSpace= false i+1= 5 localThis[i+1..]= <e orange gr>
b= 32 bSpace= true i+1= 6 localThis[i+1..]= < orange gr>
yield: chunk = <apple>
b= 32 bSpace= true i+1= 7 localThis[i+1..]= < orange gr>
b= 32 bSpace= true i+1= 8 localThis[i+1..]= < orange gr>
b= 111 bSpace= false i+1= 9 localThis[i+1..]= <orange gr>
---> new chunk begins at chunkStart = 9
b= 114 bSpace= false i+1= 10 localThis[i+1..]= <range gr>
b= 97 bSpace= false i+1= 11 localThis[i+1..]= <ange gr>
b= 110 bSpace= false i+1= 12 localThis[i+1..]= <nge gr>
b= 103 bSpace= false i+1= 13 localThis[i+1..]= <ge gr>
b= 101 bSpace= false i+1= 14 localThis[i+1..]= <e gr>
b= 32 bSpace= true i+1= 15 localThis[i+1..]= < gr>
yield: chunk = <orange>
b= 103 bSpace= false i+1= 16 localThis[i+1..]= <gr>
---> new chunk begins at chunkStart = 16
b= 114 bSpace= false i+1= 17 localThis[i+1..]= <r>
yield: chunk(last) = <gr>
a.split() = apple orange gr
while a = "apple orange g" gives
>>> Entering the 2nd split()
localThis.len = 16
for i in 0..#localThis.len = 0..15
b= 97 bSpace= false i+1= 1 localThis[i+1..]= <apple orange g>
---> new chunk begins at chunkStart = 1
b= 112 bSpace= false i+1= 2 localThis[i+1..]= <pple orange g>
b= 112 bSpace= false i+1= 3 localThis[i+1..]= <ple orange g>
b= 108 bSpace= false i+1= 4 localThis[i+1..]= <le orange g>
b= 101 bSpace= false i+1= 5 localThis[i+1..]= <e orange g>
b= 32 bSpace= true i+1= 6 localThis[i+1..]= < orange g>
yield: chunk = <apple>
b= 32 bSpace= true i+1= 7 localThis[i+1..]= < orange g>
b= 32 bSpace= true i+1= 8 localThis[i+1..]= < orange g>
b= 111 bSpace= false i+1= 9 localThis[i+1..]= <orange g>
---> new chunk begins at chunkStart = 9
b= 114 bSpace= false i+1= 10 localThis[i+1..]= <range g>
b= 97 bSpace= false i+1= 11 localThis[i+1..]= <ange g>
b= 110 bSpace= false i+1= 12 localThis[i+1..]= <nge g>
b= 103 bSpace= false i+1= 13 localThis[i+1..]= <ge g>
b= 101 bSpace= false i+1= 14 localThis[i+1..]= <e g>
b= 32 bSpace= true i+1= 15 localThis[i+1..]= < g>
yield: chunk = <orange>
b= 103 bSpace= false i+1= 16 localThis[i+1..]= <g>
---> new chunk begins at chunkStart = 16
a.split() = apple orange
So, it seems like hitting the case that Nikhil suggests.
Because I could not go further, I would appreciate
if it will be addressed by experts (if necessary) :)
(because I'm still not sure about the syntax itself...)
PS. I think Chapel is a really nice language and hope
it will become more widely used in future!
My best regards,
--Takeshi
2017-03-17 6:08 GMT+09:00 Nikhil Padmanabhan <[email protected]>:
> Just staring at the code, I'm guessing the issue is on line 752-753 of
>
> https://github.com/chapel-lang/chapel/blob/master/modules/internal/String.chpl
>
> If the last character is the start of a chunk, then yieldChunk is not set,
> and the loop ends, without returning the character.
>
> I *think* putting in a check to see if chunkStart==localThis.len and setting
> chunk and yieldChunk if true should work.
>
> -- Nikhil
>
> ---------------------------------
> Nikhil Padmanabhan
> [email protected]
>
> On Thu, Mar 16, 2017 at 4:18 PM, Brad Chamberlain <[email protected]> wrote:
>>
>>
>> Hi --
>>
>> I strongly suspect that you aren't running into anything more subtle than
>> a bug in split()'s implementation -- probably an off-by-one issue, though
>> someone with more knowledge of the routines may correct me if I'm wrong.
>>
>> The routines are defined in modules/internal/String.chpl (search on 'iter
>> split()') if you're interested in seeing if you can find+fix the bug
>> yourself. If not, and you'd be willing to file a bug report as a GitHub
>> issue, that would be terrific.
>>
>> -Brad
>>
>>
>> On Thu, 16 Mar 2017, Takeshi Yamamoto wrote:
>>
>> > Hello,
>> >
>> > While I'm learning the basics of Chapel, I have come across
>> > the following case where split() seems to give an unexpected result:
>> >
>> > proc test()
>> > {
>> > var a = "apple orange grape"; // OK
>> > // var a = "apple orange g"; // NG
>> >
>> > writeln( "a = ", a );
>> > writeln( "a.split() = ", a.split() );
>> >
>> > var b = a.split();
>> >
>> > for i in 1 .. b.size {
>> > writeln( "b[", i, "]:", b[ i ] );
>> > }
>> > }
>> >
>> > test();
>> >
>> > The above code gives
>> >
>> > a = apple orange grape
>> > a.split() = apple orange grape
>> > b[1]:apple
>> > b[2]:orange
>> > b[3]:grape
>> >
>> > which is my expected result. On the other hand,
>> > if I comment the line with "OK" and uncomment the line
>> > with "NG", then I get the following result:
>> >
>> > a = apple orange g
>> > a.split() = apple orange
>> > b[1]:apple
>> > b[2]:orange
>> >
>> > That is, split() neglects the last "g" in the string a.
>> >
>> > I tried other patters also, and it seems that if the last word
>> > in a string is only a single character, then split() neglects
>> > it for some reason.
>> >
>> > # Also, if the string has \n at the end (e.g., a = ""apple orange g\n"),
>> > # I get the expected result again.
>> >
>> > Is this an expected behavior of split(), or possibly a compiler
>> > issue...?
>> >
>> > PS. Also, I'm sorry if I'm making some big (or basic) mistake about
>> > the usage of strings.
>> >
>> > My best regards,
>> > Takeshi Yamamoto
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> > _______________________________________________
>> > Chapel-users mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/chapel-users
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Chapel-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/chapel-users
>
>
String.chpl
Description: Binary data
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Chapel-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-users
