REGEX_EXTRACT not returning correct group with non greedy regex
---------------------------------------------------------------
Key: PIG-2514
URL: https://issues.apache.org/jira/browse/PIG-2514
Project: Pig
Issue Type: Bug
Components: internal-udfs
Affects Versions: 0.11
Reporter: Romain Rigaux
Assignee: Romain Rigaux
Priority: Minor
Fix For: 0.11
Hello,
REGEX_EXTRACT is using Matcher.find() instead of Matcher.matches() and so does
not work with some non greedy regular expression.
Is it the wanted behavior?
Thanks,
Romain
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html
The matches method attempts to match the entire input sequence against the
pattern.
The find method scans the input sequence looking for the next subsequence that
matches the pattern.
System.out.println("Pig's way with m.find()");
String a = "hdfs://mygrid.com/projects/";
Matcher m = Pattern.compile("(.+?)/?").matcher(a);
System.out.println(m.find());
System.out.println(m.group(1));
System.out.println(m.start());
System.out.println(m.end());
System.out.println("\nm.matches()");
a = "hdfs://mygrid.com/projects/";
m = Pattern.compile("(.+?)/?").matcher(a);
System.out.println(m.matches());
System.out.println(m.group(1));
System.out.println(m.start());
System.out.println(m.end());
System.out.println("\nREGEX_EXTRACT m.find()");
Tuple t = TupleFactory.getInstance().newTuple();
t.append(a);
t.append("(.+?)/?");
t.append(1);
System.out.println(new TestPigExtractAll().new REGEX_EXTRACT().exec(t));
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira