[ 
https://issues.apache.org/jira/browse/HADOOP-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576335#comment-17576335
 ] 

ASF GitHub Bot commented on HADOOP-18395:
-----------------------------------------

huxinqiu opened a new pull request, #4714:
URL: https://github.com/apache/hadoop/pull/4714

   ### Description of PR
   The current implementation reset src and tgt to the mark and continues 
searching when tgt has remaining and src expired first. which is probably not 
necessary.
   For example, when q is searched, it is found that src has no remaining, and 
src is reset to d to continue searching. But the remaining length of src is 
always smaller than tgt, at this point we can return -1 directly.
   `@Test
   public void testFind() throws Exception {
     Text text = new Text("abcd\u20acbdcd\u20ac");
     assertThat(text.find("cd\u20acq")).isEqualTo(-1);
   } `
   
   ### How was this patch tested?
   unit test in org.apache.hadoop.io.TestText#testFind




> Performance improvement in org.apache.hadoop.io.Text#find
> ---------------------------------------------------------
>
>                 Key: HADOOP-18395
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18395
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>            Reporter: xinqiu.hu
>            Priority: Trivial
>         Attachments: 
> 0001-retrun-1-when-tgt-has-remaining-and-src-expired-firs.patch
>
>
> The current implementation reset src and tgt to the mark and continues 
> searching when tgt has remaining and src expired first. which is probably not 
> necessary.
> {code:java}
> public int find(String what, int start) {
>   try {
>     ByteBuffer src = ByteBuffer.wrap(this.bytes, 0, this.length);
>     ByteBuffer tgt = encode(what);
>     byte b = tgt.get();
>     src.position(start);
>     while (src.hasRemaining()) {
>       if (b == src.get()) { // matching first byte
>         src.mark(); // save position in loop
>         tgt.mark(); // save position in target
>         boolean found = true;
>         int pos = src.position()-1;
>         while (tgt.hasRemaining()) {
>           if (!src.hasRemaining()) { // src expired first
>             tgt.reset();
>             src.reset();
>             found = false;
>             break;
>           }
>           if (!(tgt.get() == src.get())) {
>             tgt.reset();
>             src.reset();
>             found = false;
>             break; // no match
>           }
>         }
>         if (found) return pos;
>       }
>     }
>     return -1; // not found
>   } catch (CharacterCodingException e) {
>     throw new RuntimeException("Should not have happened", e);
>   }
> } {code}
> For example, when q is searched, it is found that src has no remaining, and 
> src is reset to d to continue searching. But the remaining length of src is 
> always smaller than tgt, at this point we can return -1 directly.
> {code:java}
> @Test
> public void testFind() throws Exception {
>   Text text = new Text("abcd\u20acbdcd\u20ac");
>   assertThat(text.find("cd\u20acq")).isEqualTo(-1);
> } {code}
> Perhaps it could be:
> {code:java}
> public int find(String what, int start) {
>   try {
>     ByteBuffer src = ByteBuffer.wrap(this.bytes, 0, this.length);
>     ByteBuffer tgt = encode(what);
>     byte b = tgt.get();
>     src.position(start);
>     while (src.hasRemaining()) {
>       if (b == src.get()) { // matching first byte
>         src.mark(); // save position in loop
>         tgt.mark(); // save position in target
>         boolean found = true;
>         int pos = src.position()-1;
>         while (tgt.hasRemaining()) {
>           if (!src.hasRemaining()) { // src expired first
>             return -1;
>           }
>           if (!(tgt.get() == src.get())) {
>             tgt.reset();
>             src.reset();
>             found = false;
>             break; // no match
>           }
>         }
>         if (found) return pos;
>       }
>     }
>     return -1; // not found
>   } catch (CharacterCodingException e) {
>     throw new RuntimeException("Should not have happened", e);
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to