hokein added inline comments.
================ Comment at: clangd/index/Index.h:32 + // Character offset on a line in a document (zero-based). + int Character = 0; + }; ---------------- MaskRay wrote: > hokein wrote: > > sammccall wrote: > > > sammccall wrote: > > > > Column? > > > > > > > > LSP calls this "character" but this is nonstandard and I find it very > > > > confusing with offset. > > > We should document what this is an offset into: bytes, utf-16 code units, > > > or unicode codepoints. (Or even grid offsets - glyphs and doublewidth are > > > a thing) > > > > > > Given that we intend to send it over LSP without reading the source, only > > > utf-16 code units is really correct. Unicode codepoints is "nicer" and > > > will give correct results in the BMP, while bytes will be correct for > > > ASCII only. > > > > > > I'd vote for making this utf-16 code units. > > > > > > It's OK if the code populating it doesn't get this right (confuses bytes > > > and code units) but add a fixme. > > Done. Added FIXME. > I'd vote for Unicode code points. > > I haven't looked into this closely. But UTF-8 vs UTF-16 vs Unicode code > points should not be a big issue here. Unless you use emojis or some uncommon > characters, the usage of UTF-16 code units in LSP does not have a lot of harm. > > // 😹😹👻 anything weird hidden in line comments can be ignored because > they don't affect offset calculation > > And Microsoft might change the spec one day > https://github.com/Microsoft/language-server-protocol/issues/376 Yeah, It'd be nicer if LSP spec is changed to use UTF-8. The `Column` is intended to align with `Character` of LSP's `Position`, let's keep it as it is at the moment. Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D45513 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits