[
https://issues.apache.org/jira/browse/PIG-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214270#comment-13214270
]
Daniel Dai commented on PIG-2514:
---------------------------------
Looks good. Better to change REGEX_EXTRACT_ALL to make it symmetric.
> REGEX_EXTRACT not returning correct group with non greedy regex
> ---------------------------------------------------------------
>
> Key: PIG-2514
> URL: https://issues.apache.org/jira/browse/PIG-2514
> Project: Pig
> Issue Type: Bug
> Components: internal-udfs
> Affects Versions: 0.11
> Reporter: Romain Rigaux
> Assignee: Romain Rigaux
> Priority: Minor
> Fix For: 0.11
>
> Attachments: PIG-2514-doc.patch, PIG-2514.patch
>
>
> Hello,
> REGEX_EXTRACT is using Matcher.find() instead of Matcher.matches() and so
> does not work with some non greedy regular expression.
> Is it the wanted behavior?
> Thanks,
> Romain
> http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html
> The matches method attempts to match the entire input sequence against the
> pattern.
> The find method scans the input sequence looking for the next subsequence
> that matches the pattern.
> System.out.println("Pig's way with m.find()");
> String a = "hdfs://mygrid.com/projects/";
> Matcher m = Pattern.compile("(.+?)/?").matcher(a);
> System.out.println(m.find());
> System.out.println(m.group(1));
> System.out.println(m.start());
> System.out.println(m.end());
> System.out.println("\nm.matches()");
> a = "hdfs://mygrid.com/projects/";
> m = Pattern.compile("(.+?)/?").matcher(a);
> System.out.println(m.matches());
> System.out.println(m.group(1));
> System.out.println(m.start());
> System.out.println(m.end());
> System.out.println("\nREGEX_EXTRACT m.find()");
> Tuple t = TupleFactory.getInstance().newTuple();
> t.append(a);
> t.append("(.+?)/?");
> t.append(1);
> System.out.println(new TestPigExtractAll().new REGEX_EXTRACT().exec(t));
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira