Re: UTF-8 Support for TextParser

Tianqi Chen Mon, 26 Feb 2018 15:53:55 -0800

Since LibSVM format is only going to involve numbers and possibly ascii
characters, is there any reason adding UTF-8 support? Note that
generalization always comes with cost of efficiency and there is some
effort spent on making parser fast


Tianqi

On Mon, Feb 26, 2018 at 3:38 PM, Anirudh <[email protected]> wrote:

> Hi all,
>
> Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text parsers.
> I am currently working on adding UTF-8 support for Text parsers. Since C++
> doesn't have a great built-in support for UTF-8, I am looking at
> third-party libraries which provide Unicode support. I am considering ICU
> currently. Any comments, suggestions, past experience, gotchas about
> unicode third party libraries or adding unicode support in general is
> highly appreciated.
>
> I have created an issue about the same:
> https://github.com/dmlc/dmlc-core/issues/372
> Please feel free to reply to this email or comment on the github issue if
> you have any inputs.
>
> Anirudh
>

Re: UTF-8 Support for TextParser

Reply via email to