Brad Roberts wrote:
Without looking at the docs, code, or compiling and running a test, what will
this do:

    foreach(x, splitter(",a,b,", ","))
        writefln("x = %s", a);

I'll make it multiple choice:

choice 1)
  x = a
  x = b

choice 2)
  x =
  x = a
  x = b

choice 3)
  x = a
  x = b
  x =

choice 4)
  x =
  x = a
  x = b
  x =

Later,
Brad

Thanks for bringing this to attention, Brad. Splitter does what Perl's split does: 2. This means comma is an item terminator and not an item separator. Why did I think this is a good idea? Because in most cases, I was thankful to Perl's split that it does exactly the right thing.

Whenever I read text from linguistic corpora, I see that words (or other word properties) are separated by spaces. There is never a space before the first word on a line, but there is often a trailing space at the end of the line. Why? Because the text was processed by a program that output "word, ' '" or "tag, ' '" for each word of tag. Then if I split the text by whitespace, I'd be annoyed to see that trailing spaces do matter.

For the same reason, C accepts enum X { a, b, } but not ,a ,b. Mechanically generating enum values is easier if each value has a trailing comma.

Similarly, when you split a text by '\n', a leading empty line is important, whereas you wouldn't expect a final '\n' to introduce an empty line.

Now clearly there are cases in which leading or trailing empty items are both important. I'm just saying they are more rare. We could add an enumerated parameter to Splitter:

enum PleaseFindAGoodName { terminator, separator }

foreach (line; splitter(",a,b,", ","))
    ... terminator is implicit ...
foreach (line; splitter(",a,b,", ",", PleaseFindAGoodName.separator))
    ... separator ...

We might just go with the terminator semantics and ask people who need separator semantics to use a stripl() or a munch() prior to splitting. I'd personally prefer having an enum there.


Andrei

Reply via email to