Since LibSVM format is only going to involve numbers and possibly ascii characters, is there any reason adding UTF-8 support? Note that generalization always comes with cost of efficiency and there is some effort spent on making parser fast
Tianqi On Mon, Feb 26, 2018 at 3:38 PM, Anirudh <[email protected]> wrote: > Hi all, > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text parsers. > I am currently working on adding UTF-8 support for Text parsers. Since C++ > doesn't have a great built-in support for UTF-8, I am looking at > third-party libraries which provide Unicode support. I am considering ICU > currently. Any comments, suggestions, past experience, gotchas about > unicode third party libraries or adding unicode support in general is > highly appreciated. > > I have created an issue about the same: > https://github.com/dmlc/dmlc-core/issues/372 > Please feel free to reply to this email or comment on the github issue if > you have any inputs. > > Anirudh >
