Hi,

Thanks very much for your comments. I have tried looking
at the suggested code, but it was not easy for me
to modify the code... XD (so still not success). Anyway,
to get more info, I first simplified the test code as

proc test()
{
       var a = "apple   orange grape";  // OK
    // var a = "apple   orange gr";        // OK
    // var a = "apple   orange g";         // NG

    var spl = a.split();
    writeln( "a.split() = ", spl );
}

test();

and the second split() routine as follows:

-------------------------------------------------------------------------

# The original code is here:
https://github.com/chapel-lang/chapel/blob/master/modules/internal/String.chpl#L715

# A simplified code (just to print variables):

iter split ( maxsplit: int = -1 ) /* : string */ {

    if !this.isEmptyString() {
        const localThis: string = this.localize();

        var done : bool = false;
        var yieldChunk : bool = false;
        var chunk : string;
        var inChunk : bool = false;
        var chunkStart : int;

        var iEnd = localThis.len - 1;

        //>>>
        writeln( ">>> Entering the 2nd split()" );
        writeln( "localThis.len = ", localThis.len );
        writeln( "for i in 0..#localThis.len = ", 0..#localThis.len );
        //<<<

        // Loop over chars in the string.
        for i in 0..#localThis.len {

            var b = localThis.buff[ i ];
            var bSpace = byte_isWhitespace( b );

            //>>>
            writeln( "    b= ", b, "  bSpace= ", bSpace, "  i+1= ", i+1,
                     "  localThis[i+1..]= <", localThis[ i+1.. ], ">" );
            //<<<

            // first char of a chunk
            if ! ( inChunk || bSpace ) {

                chunkStart = i + 1; // 0-based buff -> 1-based range
                inChunk = true;

                //>>>
                writeln( "    ---> new chunk begins at chunkStart = ",
chunkStart );
                //<<<

            } else if inChunk {

                // first char out of a chunk
                if bSpace {

                    chunk = localThis[ chunkStart..i ];
                    yieldChunk = true;
                    inChunk = false;

                    //>>>
                    writeln( "yield: chunk = <", chunk, ">" );
                    //<<<

                // out of chars
                } else if i == iEnd {

                    // --- This block seems not reached when the last word
                    // --- is a single character.

                    chunk = localThis[ chunkStart.. ];
                    yieldChunk = true;
                    done = true;

                    //>>>
                    writeln( "yield: chunk(last) = <", chunk, ">" );
                    //<<<

                } // bSpace

            } // chunk

            if yieldChunk {
                yield chunk;
                yieldChunk = false;
            }

            if done then
                break;

        } // for i
    } // if not empty
} // proc

-------------------------------------------------------------------

Then, a = "apple   orange gr" gives

>>> Entering the 2nd split()
localThis.len = 17
for i in 0..#localThis.len = 0..16
    b= 97  bSpace= false  i+1= 1  localThis[i+1..]= <apple   orange gr>
    ---> new chunk begins at chunkStart = 1
    b= 112  bSpace= false  i+1= 2  localThis[i+1..]= <pple   orange gr>
    b= 112  bSpace= false  i+1= 3  localThis[i+1..]= <ple   orange gr>
    b= 108  bSpace= false  i+1= 4  localThis[i+1..]= <le   orange gr>
    b= 101  bSpace= false  i+1= 5  localThis[i+1..]= <e   orange gr>
    b= 32  bSpace= true  i+1= 6  localThis[i+1..]= <   orange gr>
yield: chunk = <apple>
    b= 32  bSpace= true  i+1= 7  localThis[i+1..]= <  orange gr>
    b= 32  bSpace= true  i+1= 8  localThis[i+1..]= < orange gr>
    b= 111  bSpace= false  i+1= 9  localThis[i+1..]= <orange gr>
    ---> new chunk begins at chunkStart = 9
    b= 114  bSpace= false  i+1= 10  localThis[i+1..]= <range gr>
    b= 97  bSpace= false  i+1= 11  localThis[i+1..]= <ange gr>
    b= 110  bSpace= false  i+1= 12  localThis[i+1..]= <nge gr>
    b= 103  bSpace= false  i+1= 13  localThis[i+1..]= <ge gr>
    b= 101  bSpace= false  i+1= 14  localThis[i+1..]= <e gr>
    b= 32  bSpace= true  i+1= 15  localThis[i+1..]= < gr>
yield: chunk = <orange>
    b= 103  bSpace= false  i+1= 16  localThis[i+1..]= <gr>
    ---> new chunk begins at chunkStart = 16
    b= 114  bSpace= false  i+1= 17  localThis[i+1..]= <r>
yield: chunk(last) = <gr>
a.split() = apple orange gr


while a = "apple   orange g" gives


>>> Entering the 2nd split()
localThis.len = 16
for i in 0..#localThis.len = 0..15
    b= 97  bSpace= false  i+1= 1  localThis[i+1..]= <apple   orange g>
    ---> new chunk begins at chunkStart = 1
    b= 112  bSpace= false  i+1= 2  localThis[i+1..]= <pple   orange g>
    b= 112  bSpace= false  i+1= 3  localThis[i+1..]= <ple   orange g>
    b= 108  bSpace= false  i+1= 4  localThis[i+1..]= <le   orange g>
    b= 101  bSpace= false  i+1= 5  localThis[i+1..]= <e   orange g>
    b= 32  bSpace= true  i+1= 6  localThis[i+1..]= <   orange g>
yield: chunk = <apple>
    b= 32  bSpace= true  i+1= 7  localThis[i+1..]= <  orange g>
    b= 32  bSpace= true  i+1= 8  localThis[i+1..]= < orange g>
    b= 111  bSpace= false  i+1= 9  localThis[i+1..]= <orange g>
    ---> new chunk begins at chunkStart = 9
    b= 114  bSpace= false  i+1= 10  localThis[i+1..]= <range g>
    b= 97  bSpace= false  i+1= 11  localThis[i+1..]= <ange g>
    b= 110  bSpace= false  i+1= 12  localThis[i+1..]= <nge g>
    b= 103  bSpace= false  i+1= 13  localThis[i+1..]= <ge g>
    b= 101  bSpace= false  i+1= 14  localThis[i+1..]= <e g>
    b= 32  bSpace= true  i+1= 15  localThis[i+1..]= < g>
yield: chunk = <orange>
    b= 103  bSpace= false  i+1= 16  localThis[i+1..]= <g>
    ---> new chunk begins at chunkStart = 16
a.split() = apple orange


So, it seems like hitting the case that Nikhil suggests.
Because I could not go further, I would appreciate
if it will be addressed by experts (if necessary) :)
 (because I'm still not sure about the syntax itself...)


PS. I think Chapel is a really nice language and hope
it will become more widely used in future!

My best regards,
--Takeshi

2017-03-17 6:08 GMT+09:00 Nikhil Padmanabhan <[email protected]>:
> Just staring at the code, I'm guessing the issue is on line 752-753 of
>
> https://github.com/chapel-lang/chapel/blob/master/modules/internal/String.chpl
>
> If the last character is the start of a chunk, then yieldChunk is not set,
> and the loop ends, without returning the character.
>
> I *think* putting in a check to see if chunkStart==localThis.len and setting
> chunk and yieldChunk if true should work.
>
> -- Nikhil
>
> ---------------------------------
> Nikhil Padmanabhan
> [email protected]
>
> On Thu, Mar 16, 2017 at 4:18 PM, Brad Chamberlain <[email protected]> wrote:
>>
>>
>> Hi --
>>
>> I strongly suspect that you aren't running into anything more subtle than
>> a bug in split()'s implementation -- probably an off-by-one issue, though
>> someone with more knowledge of the routines may correct me if I'm wrong.
>>
>> The routines are defined in modules/internal/String.chpl (search on 'iter
>> split()') if you're interested in seeing if you can find+fix the bug
>> yourself.  If not, and you'd be willing to file a bug report as a GitHub
>> issue, that would be terrific.
>>
>> -Brad
>>
>>
>> On Thu, 16 Mar 2017, Takeshi Yamamoto wrote:
>>
>> > Hello,
>> >
>> > While I'm learning the basics of Chapel, I have come across
>> > the following case where split() seems to give an unexpected result:
>> >
>> > proc test()
>> > {
>> >    var a = "apple orange grape";   // OK
>> >    // var a = "apple orange g";   // NG
>> >
>> >    writeln( "a = ", a );
>> >    writeln( "a.split() = ", a.split() );
>> >
>> >    var b = a.split();
>> >
>> >    for i in 1 .. b.size {
>> >        writeln( "b[", i, "]:", b[ i ] );
>> >    }
>> > }
>> >
>> > test();
>> >
>> > The above code gives
>> >
>> > a = apple orange grape
>> > a.split() = apple orange grape
>> > b[1]:apple
>> > b[2]:orange
>> > b[3]:grape
>> >
>> > which is my expected result. On the other hand,
>> > if I comment the line with "OK" and uncomment the line
>> > with "NG", then I get the following result:
>> >
>> > a = apple orange g
>> > a.split() = apple orange
>> > b[1]:apple
>> > b[2]:orange
>> >
>> > That is, split() neglects the last "g" in the string a.
>> >
>> > I tried other patters also, and it seems that if the last word
>> > in a string is only a single character, then split() neglects
>> > it for some reason.
>> >
>> > # Also, if the string has \n at the end (e.g., a = ""apple orange g\n"),
>> > # I get the expected result again.
>> >
>> > Is this an expected behavior of split(), or possibly a compiler
>> > issue...?
>> >
>> > PS. Also, I'm sorry if I'm making some big (or basic) mistake about
>> > the usage of strings.
>> >
>> > My best regards,
>> > Takeshi Yamamoto
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> > _______________________________________________
>> > Chapel-users mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/chapel-users
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Chapel-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/chapel-users
>
>

Attachment: String.chpl
Description: Binary data

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to