On 2016-03-25 16:03, Paul Sandoz wrote:
On 25 Mar 2016, at 13:56, Claes Redestad <[email protected]> wrote:

You've replaced the precomputed good suffix shift which i think makes the Math.max in math a 
bit harder to read. A comment or just move a local for "j < len - 1 ? len : 1" 
would make it a bit easier to read.

It would be good to move the value of MULTIRELEASE_CHARS to its own line as the 
current line is requires scrolling when looking at the side-by-side diffs.
Sure: http://cr.openjdk.java.net/~redestad/8152733/webrev.02/

+1

Thanks!



We could marginally improve the bad shift too:

   int badShift = (c < 128) ? lastOcc[c] : 0;

rather than masking (which might lop off the top bit of bytes of value encoded 
characters), that’s clearer code-wise to me. In fact i bet we could reduce the 
last occurrence array by half if we canonicalize to upper-case characters and 
bound the range to [‘ ‘, ‘Z’].

byte c = b[i + j];
int goodShift = 0;
int badShift = 0;
if (c >= ‘ ‘ && c <= ‘z') {
   if (c >= ‘a’) c -= 32; // Canonicalize

   if (c != src[j]) {
     // no match
     goodShift = (j < len - 1) ? len : 1;
     badShift = lastOcc[c - 32];
   }
} else {
     // no match, character not valid for name
     goodShift = (j < len - 1) ? len : 1;
}
if (goodShift > 0) {
     i += Math.max(j + 1 - badShift, goodShift);
     continue next;
}

Just idle musings on my part, feel free to ignore ‘em.

Interesting, my only concern is we'd add more code/complexity, but it seems the shift will always be == len when not matching a character in src (the proof is left out as an exercise), so this can be simplified further to:

byte c = b[i + j];
if (c >= ' ' && c <= 'z') {
    if (c >= 'a') c -= 32; // Canonicalize

    if (c != src[j]) {
        // no match
        int goodShift = (j < len - 1) ? len : 1;
        int badShift = lastOcc[c - 32];
        i += Math.max(j + 1 - badShift, goodShift);
        continue next;
    }
} else {
    // no match, character not valid for name
    i += len;
    continue next;
}

http://cr.openjdk.java.net/~redestad/8152733/webrev.03/

(included Steve's suggestion)

Since this avoids reading from lastOcc for anything but the chars actually matching, this actually seems to improve things a bit locally, so I'm willing to accept the pile-on. I'll run some numbers in the lab...

/Claes

Paul.


Difference from previous for convenience:

diff -r fc748fa355e0 src/java.base/share/classes/java/util/jar/JarFile.java
--- a/src/java.base/share/classes/java/util/jar/JarFile.java    Fri Mar 25 
13:38:37 2016 +0100
+++ b/src/java.base/share/classes/java/util/jar/JarFile.java    Fri Mar 25 
13:47:31 2016 +0100
@@ -887,11 +887,15 @@
     }

     // Statics for hand-coded Boyer-Moore search
-    private static final byte[] CLASSPATH_CHARS = 
{'c','l','a','s','s','-','p','a','t','h', ':', ' '};
+    private static final byte[] CLASSPATH_CHARS =
+            {'c','l','a','s','s','-','p','a','t','h', ':', ' '};
+
     // The bad character shift for "class-path:"
     private static final byte[] CLASSPATH_LASTOCC;

-    private static final byte[] MULTIRELEASE_CHARS = 
{'m','u','l','t','i','-','r','e','l','e', 'a', 's', 'e', ':', ' '};
+    private static final byte[] MULTIRELEASE_CHARS =
+            {'m','u','l','t','i','-','r','e','l','e', 'a', 's', 'e', ':', ' '};
+
     // The bad character shift for "multi-release: "
     private static final byte[] MULTIRELEASE_LASTOCC;

@@ -959,6 +963,8 @@
     /**
      * Returns true if the pattern {@code src} is found in {@code b}.
      * The {@code lastOcc} array is the precomputed bad character shifts.
+     * Since there are no repeated substrings in our search strings,
+     * the good character shifts can be replaced with a comparison.
      */
     private boolean match(byte[] src, byte[] b, byte[] lastOcc) {
         int len = src.length;
@@ -972,7 +978,8 @@
                     c += 32;
                 }
                 if (c != src[j]) {
-                    i += Math.max(j + 1 - lastOcc[c&0x7F], j < len - 1 ? len : 
1);
+                    int goodShift = (j < len - 1) ? len : 1;
+                    i += Math.max(j + 1 - lastOcc[c&0x7F], goodShift);
                     continue next;
                 }
             }

Thanks!

/Claes

Reply via email to