rsmith accepted this revision.
rsmith added a comment.
This revision is now accepted and ready to land.

Looks good as far as it goes (though I've not checked your classification 
functions are correct).

We should have some detection for RTL characters in string literals and 
comments too, at least when there is no subsequent character with strong LTR 
directionality. Neither `"` nor `*/` has strong directionality, so both `א"` 
and `א*/` can potentially leave us in an RTL state. For example:

  // This is "א" then + then "ג".
  "א" + "ג"
  
  // This is `a` then `/*א*/` then `*` then `-` then `/*ג*/` then `b`, aka "a * 
- b" not "a - * b".
  a /*א*/ * - /*ג*/ b



================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:2
+//===--- MisleadingIdentifier.cpp - clang-tidy
+//--------------------------------===//
+//
----------------
Please fix


================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:19
+
+// see https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedBidiClass.txt
+static bool isUnassignedAL(llvm::UTF32 CP) {
----------------
(throughout).


================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:134-135
+        llvm::strictConversion);
+    if (Result != llvm::conversionOK)
+      break;
+    if (isUnassignedAL(CodePoint) || isUnassignedR(CodePoint) || 
isR(CodePoint))
----------------
As in the other patch, we could recover here by skipping until we find a valid 
UTF-8 lead byte. It's not important here since invalid UTF-8 won't be accepted 
in an identifier anyway, but might be important if we start checking strings 
and comments for RTL characters too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to