On 09/10/2012 07:47 AM, Junio C Hamano wrote:
> Michael Haggerty <[email protected]> writes:
>
>> ... Consider something like
>>
>> struct string_list *split_file_into_words(FILE *f)
>> {
>> char buf[1024];
>> struct string_list *list = new string list;
>> list->strdup_strings = 1;
>> while (not EOF) {
>> read_line_into_buf();
>> string_list_split_in_place(list, buf, ' ', -1);
>> }
>> return list;
>> }
>
> That is a prime example to argue that string_list_split() would make
> more sense, no? The caller does _not_ mind if the function mucks
> with buf, but the resulting list is not allowed to point into buf.
>
> In such a case, the caller shouldn't have to _care_ if it wants to
> allow buf to be mucked with; it is already asking that the resulting
> list _not_ point into buf by setting strdup_strings (by the way,
> that is part of the function input, so think of it like various *opt
> variables passed into functions to tweak their behaviour). If the
> implementation can do so without sacrificing performance (and in
> this case, as you said, it can), it should take "const char *buf".
You're right, I was thinking that a caller of
string_list_split_in_place() could choose to remain ignorant of whether
strdup_strings is set, but this is incorrect because it needs to know
whether to do its own memory management of the strings that it passes
into string_list_append().
> So it appears to me that sl_split_in_place(), if implemented, should
> be kept as a special case for performance-minded callers that have
> full control of the lifetime rules of the variables they use, can
> set strdup_strings to false, and can let buf modified in place, and
> can accept list that point into buf.
OK, so the bottom line would be to have two versions of the function.
One takes a (const char *) and *requires* strdup_strings to be set on
its input list:
int string_list_split(struct string_list *list, const char *string,
int delim, int maxsplit)
{
assert(list->strdup_strings);
...
}
The other takes a (char *) and modifies it in-place, and maybe even
requires strdup_strings to be false on its input list:
int string_list_split_in_place(struct string_list *list, char *string,
int delim, int maxsplit)
{
/* not an error per se but a strong suggestion of one: */
assert(!list->strdup_strings);
...
}
(The latter (modulo assert) is the one that I have implemented, but it
might not be needed immediately.) Do you agree?
>>>> + * Examples:
>>>> + * string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo",
>>>> "bar", "baz"]
>>>> + * string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo",
>>>> "bar:baz"]
>>>> + * string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar",
>>>> ""]
>>>
>>> I would find it more natural to see a sentinel value against
>>> "positive" to be 0, not -1. "-1" gives an impression as if "-2"
>>> might do something different from "-1", but Zero is a lot more
>>> special.
>>
>> You have raised a good point and I think there is a flaw in the API, but
>> I'm not sure I agree with you what the flaw is...
>>
>> The "maxsplit" argument limits the number of times the string should be
>> split. I.e., if maxsplit is set, then the output will have at most
>> (maxsplit + 1) strings.
>
> So "do not split, just give me the whole thing" would be maxsplit == 0
> to split into (maxsplit+1) == 1 string. I think we are in agreement
> that your "-1" does not make any sense, and your documentation that
> said "positive" is the saner thing to do, no?
No. I think that my handling of maxsplit=0 was incorrect but that we
should continue using -1 as the special value.
I see maxsplit=0 as a legitimate degenerate case meaning "split into 1
string". Granted, it would only be useful in specialized situations
[1]. Moreover, "-1" makes a much more obvious special value than "0";
somebody reading code with maxsplit=-1 knows immediately that this is a
special value, whereas the handling of maxsplit=0 isn't quite so
blindingly obvious unless the reader knows the outcome of our current
discussion :-)
Therefore I still prefer treating only negative values of maxsplit to
mean "unlimited" and fixing maxsplit=0 as described. But if you insist
on the other convention, let me know and I will change it.
Michael
[1] A case I can think of would be parsing a format like
NUMPARENTS [PARENT...] SUMMARY
where "string_list_split(list, rest_of_line, ' ', numparents)" does the
right thing even if numparents==0.
--
Michael Haggerty
[email protected]
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html