>From Ian Maxon <[email protected]>: Attention is currently required from: Peeyush Gupta, Ritik Raj.
Ian Maxon has posted comments on this change by Ritik Raj. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984?usp=email ) Change subject: [NO ISSUE][RT] Upgrade UTF8StringPointable string search to KMP ...................................................................... Patch Set 1: (4 comments) File hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/dfc1c269_6b2a697c?usp=email : PS1, Line 309: computeLPS not sure where this LPS terminology comes from. CLRS calls it the prefix function or pi https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/8b593cda_8b766463?usp=email : PS1, Line 315: if (patternChars[i] == patternChars[len]) { should we consider whether the pattern is valid w.r.t surrogate pairs when computing the prefix function? https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/bcf0bb38_741ac1ea?usp=email : PS1, Line 382: if (Character.isHighSurrogate(ch1)) { : prevHigh = true; : } else if (Character.isLowSurrogate(ch1)) { : if (prevHigh) { : codePointCount++; : prevHigh = false; : } else { : throw HyracksDataException.create(INVALID_STRING_UNICODE, : LOW_SURROGATE_WITHOUT_HIGH_SURROGATE); : } : } else { : codePointCount++; : } this whole little snippet seems identical to 349-361 https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/d3a218ff_260b4b18?usp=email : PS1, Line 409: c1 += size; : if (!resultInByte) { : if (Character.isHighSurrogate(ch1)) { : prevHigh = true; : } else if (Character.isLowSurrogate(ch1)) { : if (prevHigh) { : codePointCount++; : prevHigh = false; : } else { : throw HyracksDataException.create(INVALID_STRING_UNICODE, : LOW_SURROGATE_WITHOUT_HIGH_SURROGATE); : } : } else { : codePointCount++; : } : } here as well -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: comment Gerrit-Project: asterixdb Gerrit-Branch: lumina Gerrit-Change-Id: Ia00fbce6499a5258127c91d3ce62270722b89112 Gerrit-Change-Number: 20984 Gerrit-PatchSet: 1 Gerrit-Owner: Ritik Raj <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Peeyush Gupta <[email protected]> Gerrit-Reviewer: Ritik Raj <[email protected]> Gerrit-CC: Anon. E. Moose #1000171 Gerrit-CC: Ian Maxon <[email protected]> Gerrit-Attention: Peeyush Gupta <[email protected]> Gerrit-Attention: Ritik Raj <[email protected]> Gerrit-Comment-Date: Wed, 11 Mar 2026 20:45:40 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No
