[
https://issues.apache.org/jira/browse/HADOOP-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576335#comment-17576335
]
ASF GitHub Bot commented on HADOOP-18395:
-----------------------------------------
huxinqiu opened a new pull request, #4714:
URL: https://github.com/apache/hadoop/pull/4714
### Description of PR
The current implementation reset src and tgt to the mark and continues
searching when tgt has remaining and src expired first. which is probably not
necessary.
For example, when q is searched, it is found that src has no remaining, and
src is reset to d to continue searching. But the remaining length of src is
always smaller than tgt, at this point we can return -1 directly.
`@Test
public void testFind() throws Exception {
Text text = new Text("abcd\u20acbdcd\u20ac");
assertThat(text.find("cd\u20acq")).isEqualTo(-1);
} `
### How was this patch tested?
unit test in org.apache.hadoop.io.TestText#testFind
> Performance improvement in org.apache.hadoop.io.Text#find
> ---------------------------------------------------------
>
> Key: HADOOP-18395
> URL: https://issues.apache.org/jira/browse/HADOOP-18395
> Project: Hadoop Common
> Issue Type: Improvement
> Components: io
> Reporter: xinqiu.hu
> Priority: Trivial
> Attachments:
> 0001-retrun-1-when-tgt-has-remaining-and-src-expired-firs.patch
>
>
> The current implementation reset src and tgt to the mark and continues
> searching when tgt has remaining and src expired first. which is probably not
> necessary.
> {code:java}
> public int find(String what, int start) {
> try {
> ByteBuffer src = ByteBuffer.wrap(this.bytes, 0, this.length);
> ByteBuffer tgt = encode(what);
> byte b = tgt.get();
> src.position(start);
> while (src.hasRemaining()) {
> if (b == src.get()) { // matching first byte
> src.mark(); // save position in loop
> tgt.mark(); // save position in target
> boolean found = true;
> int pos = src.position()-1;
> while (tgt.hasRemaining()) {
> if (!src.hasRemaining()) { // src expired first
> tgt.reset();
> src.reset();
> found = false;
> break;
> }
> if (!(tgt.get() == src.get())) {
> tgt.reset();
> src.reset();
> found = false;
> break; // no match
> }
> }
> if (found) return pos;
> }
> }
> return -1; // not found
> } catch (CharacterCodingException e) {
> throw new RuntimeException("Should not have happened", e);
> }
> } {code}
> For example, when q is searched, it is found that src has no remaining, and
> src is reset to d to continue searching. But the remaining length of src is
> always smaller than tgt, at this point we can return -1 directly.
> {code:java}
> @Test
> public void testFind() throws Exception {
> Text text = new Text("abcd\u20acbdcd\u20ac");
> assertThat(text.find("cd\u20acq")).isEqualTo(-1);
> } {code}
> Perhaps it could be:
> {code:java}
> public int find(String what, int start) {
> try {
> ByteBuffer src = ByteBuffer.wrap(this.bytes, 0, this.length);
> ByteBuffer tgt = encode(what);
> byte b = tgt.get();
> src.position(start);
> while (src.hasRemaining()) {
> if (b == src.get()) { // matching first byte
> src.mark(); // save position in loop
> tgt.mark(); // save position in target
> boolean found = true;
> int pos = src.position()-1;
> while (tgt.hasRemaining()) {
> if (!src.hasRemaining()) { // src expired first
> return -1;
> }
> if (!(tgt.get() == src.get())) {
> tgt.reset();
> src.reset();
> found = false;
> break; // no match
> }
> }
> if (found) return pos;
> }
> }
> return -1; // not found
> } catch (CharacterCodingException e) {
> throw new RuntimeException("Should not have happened", e);
> }
> }{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]