Hi all, I found a behavior difference of Scanner between Harmony and RI. Here is a simple testcase[1]. RI will return a successful match result " *" while Harmony would fail to find a match and return null. I looked into code and found the root cause why Harmony fails to find a match was that the Scanner would ignore the next line terminator completely while trying to find a match. According to the Spec for findInLine(Pattern) method, this method "Attempts to find the next occurrence of the specified pattern ignoring delimiters." It seems our behavior of ignoring the delimiter complies with the Spec. But for the specific pattern in this case which contains a special constructs'?=' which means a zero-width positive lookahead, RI's behavior indicates it didn't ignore the delimiter completely. In fact, according to the testcase result, RI would take the delimiter into consideration when it tries to find a match but exclude it in its match result. So it seems the Spec is obscure for the meaning of "ignore". To ignore the delimiter at all even when scanning as Harmony does or just ignore it in the match result ? RI's behavior indicates it means the later one. So do we need to follow RI's behavior?
I've raised a JIRA for this issue at https://issues.apache.org/jira/browse/HARMONY-6087 And I've also attached a patch to follow RI's behavior. [1] import java.util.Scanner; import java.util.regex.Pattern; public class SpecialPattern { private static final Pattern pattern = Pattern.compile("^\\s*(?:\\*(?=[^/]))"); public static void main(String[] args) { Scanner scn = new Scanner(" *\n"); String found = scn.findInLine(pattern); System.out.print(found); } } Result of RI: * Result of Harmony: null -- Best Regards, Jim, Jun Jie Yu China Software Development Lab, IBM
