empiredan commented on PR #1240:
URL: 
https://github.com/apache/incubator-pegasus/pull/1240#issuecomment-1314721615

   > I believe that we shouldn't invest too much effort in writing 
commonly-used utilities on our own. Instead, we should find if there's a 
trustworthy library providing this functionality, such as 
https://abseil.io/docs/cpp/guides/strings#abslstrsplit-for-splitting-strings or 
https://www.boost.org/doc/libs/1_80_0/doc/html/string_algo/usage.html.
   > 
   > I feel that C++ coders rarely consider relying on the community. 
Programmers using other languages would never think of writing their string 
split unless there are some special requirements. Therefore, if possible, 
please try out the above two libraries and see if they can address your problem.
   
   Good advice ! I strongly agree that we should use mature libraries for basic 
tools. And initially I was going to use [boost 
split](https://www.boost.org/doc/libs/1_69_0/doc/html/boost/algorithm/split.html)
 since we'd introduced boost 1.69.0 as our third-party package. However, 
according to the doc boost only supports outputting tokens to sequence 
containers such as `std::vector` or `std::list` even in newest version 1.80.0. 
Actually tokens are also extracted to associative containers such as 
`unordered_set` which has already been supported by old `split_args`.
   
   According to 
[str_split_test.cc](https://github.com/abseil/abseil-cpp/blob/master/absl/strings/str_split_test.cc),
 it seems that abseil has supported all types of containers. However, there is 
some special logic for `split_args`: for example, after split for each time, 
old `split_args` will strip the leading and trailing spaces; moreover, the 
characters belonging to leading spaces are not the same with those belonging to 
trailing spaces though I wonder why this happened and which code has depended 
on this.
   
   I think by 
[StripAsciiWhitespace](https://github.com/abseil/abseil-cpp/blob/master/absl/strings/ascii.h)
 each split can be stripped. However, this will erase the string while new 
`split_args` will just skip leading and trailing spaces of tokens after split 
without extra stripping operations. Also, the reason why inequality of 
characters of leading and trailing spaces should be checked. Abseil uses 
`absl::ascii_isspace` to check both leading and trailing spaces. Even if 
striping can be implemented by ourselves, it will have to do extra operations 
and another copy.
   
   To sum up, I think we can just `split_args` firstly. I believe the common 
library will function nevertheless. However, it will cost some time to verify. 
I'll create other new issues to track. This is an important job that will be 
supported for long term.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to