>From Ritik Raj <[email protected]>: Attention is currently required from: Ian Maxon, Peeyush Gupta.
Ritik Raj has posted comments on this change by Ritik Raj. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984?usp=email ) Change subject: [ASTERIXDB-3715][RT] Upgrade UTF8StringPointable string search to KMP ...................................................................... Patch Set 2: Code-Review+1 (5 comments) Commit Message: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/4b70c3ad_a2ec8ea2?usp=email : PS1, Line 7: NO ISSUE > would also be great to have an issue referenced for this on the *db side, > considering there's appare […] Done File hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/3ba71d8d_b7f21e0c?usp=email : PS1, Line 309: computeLPS > not sure where this LPS terminology comes from. […] Ah, true, pi is very common terminology; I didn’t know that earlier. But LPS (Longest Proper Prefix) is also widely used in the context of the KMP algorithm. At least that’s what we used to call it back in college 😄 I am open to update it. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/3b04f4c1_abfe9912?usp=email : PS1, Line 315: if (patternChars[i] == patternChars[len]) { > should we consider whether the pattern is valid w.r. […] true, but ig we won't ever be matching if pattern does not have a valid surrogate pair... https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/229b243e_4abab48d?usp=email : PS1, Line 382: if (Character.isHighSurrogate(ch1)) { : prevHigh = true; : } else if (Character.isLowSurrogate(ch1)) { : if (prevHigh) { : codePointCount++; : prevHigh = false; : } else { : throw HyracksDataException.create(INVALID_STRING_UNICODE, : LOW_SURROGATE_WITHOUT_HIGH_SURROGATE); : } : } else { : codePointCount++; : } > this whole little snippet seems identical to 349-361 yeah, I thought to extract it out, but felt not much readable. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/0bff1b95_a31817f1?usp=email : PS1, Line 409: c1 += size; : if (!resultInByte) { : if (Character.isHighSurrogate(ch1)) { : prevHigh = true; : } else if (Character.isLowSurrogate(ch1)) { : if (prevHigh) { : codePointCount++; : prevHigh = false; : } else { : throw HyracksDataException.create(INVALID_STRING_UNICODE, : LOW_SURROGATE_WITHOUT_HIGH_SURROGATE); : } : } else { : codePointCount++; : } : } > here as well Acknowledged -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: comment Gerrit-Project: asterixdb Gerrit-Branch: lumina Gerrit-Change-Id: Ia00fbce6499a5258127c91d3ce62270722b89112 Gerrit-Change-Number: 20984 Gerrit-PatchSet: 2 Gerrit-Owner: Ritik Raj <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Peeyush Gupta <[email protected]> Gerrit-Reviewer: Ritik Raj <[email protected]> Gerrit-CC: Anon. E. Moose #1000171 Gerrit-CC: Ian Maxon <[email protected]> Gerrit-Attention: Peeyush Gupta <[email protected]> Gerrit-Attention: Ian Maxon <[email protected]> Gerrit-Comment-Date: Thu, 12 Mar 2026 18:02:39 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: Yes Comment-In-Reply-To: Ian Maxon <[email protected]>
