hokein added inline comments.

================
Comment at: clangd/index/Index.h:32
+    // Character offset on a line in a document (zero-based).
+    int Character = 0;
+  };
----------------
MaskRay wrote:
> hokein wrote:
> > sammccall wrote:
> > > sammccall wrote:
> > > > Column?
> > > > 
> > > > LSP calls this "character" but this is nonstandard and I find it very 
> > > > confusing with offset. 
> > > We should document what this is an offset into: bytes, utf-16 code units, 
> > > or unicode codepoints. (Or even grid offsets - glyphs and doublewidth are 
> > > a thing)
> > > 
> > > Given that we intend to send it over LSP without reading the source, only 
> > > utf-16 code units is really correct. Unicode codepoints is "nicer" and 
> > > will give correct results in the BMP, while bytes will be correct for 
> > > ASCII only.
> > > 
> > > I'd vote for making this utf-16 code units.
> > > 
> > > It's OK if the code populating it doesn't get this right (confuses bytes 
> > > and code units) but add a fixme.
> > Done. Added FIXME.
> I'd vote for Unicode code points.
> 
> I haven't looked into this closely. But UTF-8 vs UTF-16 vs Unicode code 
> points should not be a big issue here. Unless you use emojis or some uncommon 
> characters, the usage of UTF-16 code units in LSP does not have a lot of harm.
> 
>     //  😹😹👻 anything weird hidden in line comments can be ignored because 
> they don't affect offset calculation
> 
> And Microsoft might change the spec one day 
> https://github.com/Microsoft/language-server-protocol/issues/376
Yeah, It'd be nicer if LSP spec is changed to use UTF-8. The `Column` is 
intended to align with `Character` of LSP's `Position`, let's keep it as it is 
at the moment.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D45513



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to