>From Ritik Raj <[email protected]>:

Attention is currently required from: Ian Maxon, Peeyush Gupta.

Ritik Raj has posted comments on this change by Ritik Raj. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984?usp=email )

Change subject: [ASTERIXDB-3715][RT] Upgrade UTF8StringPointable string search 
to KMP
......................................................................


Patch Set 2: Code-Review+1

(5 comments)

Commit Message:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/4b70c3ad_a2ec8ea2?usp=email
 :
PS1, Line 7: NO ISSUE
> would also be great to have an issue referenced for this on the *db side, 
> considering there's appare […]
Done


File 
hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/3ba71d8d_b7f21e0c?usp=email
 :
PS1, Line 309: computeLPS
> not sure where this LPS terminology comes from. […]
Ah, true, pi is very common terminology; I didn’t know that earlier. But LPS 
(Longest Proper Prefix) is also widely used in the context of the KMP 
algorithm. At least that’s what we used to call it back in college 😄

I am open to update it.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/3b04f4c1_abfe9912?usp=email
 :
PS1, Line 315:  if (patternChars[i] == patternChars[len]) {
> should we consider whether the pattern is valid w.r. […]
true, but ig we won't ever be matching if pattern does not have a valid 
surrogate pair...


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/229b243e_4abab48d?usp=email
 :
PS1, Line 382:  if (Character.isHighSurrogate(ch1)) {
             :                         prevHigh = true;
             :                     } else if (Character.isLowSurrogate(ch1)) {
             :                         if (prevHigh) {
             :                             codePointCount++;
             :                             prevHigh = false;
             :                         } else {
             :                             throw 
HyracksDataException.create(INVALID_STRING_UNICODE,
             :                                     
LOW_SURROGATE_WITHOUT_HIGH_SURROGATE);
             :                         }
             :                     } else {
             :                         codePointCount++;
             :                     }
> this whole little snippet seems identical to 349-361
yeah, I thought to extract it out, but felt not much readable.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984/comment/0bff1b95_a31817f1?usp=email
 :
PS1, Line 409:  c1 += size;
             :                     if (!resultInByte) {
             :                         if (Character.isHighSurrogate(ch1)) {
             :                             prevHigh = true;
             :                         } else if 
(Character.isLowSurrogate(ch1)) {
             :                             if (prevHigh) {
             :                                 codePointCount++;
             :                                 prevHigh = false;
             :                             } else {
             :                                 throw 
HyracksDataException.create(INVALID_STRING_UNICODE,
             :                                         
LOW_SURROGATE_WITHOUT_HIGH_SURROGATE);
             :                             }
             :                         } else {
             :                             codePointCount++;
             :                         }
             :                     }
> here as well
Acknowledged



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20984?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: comment
Gerrit-Project: asterixdb
Gerrit-Branch: lumina
Gerrit-Change-Id: Ia00fbce6499a5258127c91d3ce62270722b89112
Gerrit-Change-Number: 20984
Gerrit-PatchSet: 2
Gerrit-Owner: Ritik Raj <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Michael Blow <[email protected]>
Gerrit-Reviewer: Peeyush Gupta <[email protected]>
Gerrit-Reviewer: Ritik Raj <[email protected]>
Gerrit-CC: Anon. E. Moose #1000171
Gerrit-CC: Ian Maxon <[email protected]>
Gerrit-Attention: Peeyush Gupta <[email protected]>
Gerrit-Attention: Ian Maxon <[email protected]>
Gerrit-Comment-Date: Thu, 12 Mar 2026 18:02:39 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: Yes
Comment-In-Reply-To: Ian Maxon <[email protected]>

Reply via email to