REGEX_EXTRACT not returning correct group with non greedy regex
---------------------------------------------------------------

                 Key: PIG-2514
                 URL: https://issues.apache.org/jira/browse/PIG-2514
             Project: Pig
          Issue Type: Bug
          Components: internal-udfs
    Affects Versions: 0.11
            Reporter: Romain Rigaux
            Assignee: Romain Rigaux
            Priority: Minor
             Fix For: 0.11


Hello,

REGEX_EXTRACT is using Matcher.find() instead of Matcher.matches() and so does 
not work with some non greedy regular expression.

Is it the wanted behavior?

Thanks,

Romain


http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html


The matches method attempts to match the entire input sequence against the 
pattern.

The find method scans the input sequence looking for the next subsequence that 
matches the pattern.



    System.out.println("Pig's way with m.find()");
    String a = "hdfs://mygrid.com/projects/";
    Matcher m = Pattern.compile("(.+?)/?").matcher(a);
    System.out.println(m.find());
    System.out.println(m.group(1));
    System.out.println(m.start());
    System.out.println(m.end());

    System.out.println("\nm.matches()");
    a = "hdfs://mygrid.com/projects/";
    m = Pattern.compile("(.+?)/?").matcher(a);
    System.out.println(m.matches());
    System.out.println(m.group(1));
    System.out.println(m.start());
    System.out.println(m.end());

    System.out.println("\nREGEX_EXTRACT m.find()");
    Tuple t = TupleFactory.getInstance().newTuple();
    t.append(a);
    t.append("(.+?)/?");
    t.append(1);
    System.out.println(new TestPigExtractAll().new REGEX_EXTRACT().exec(t));

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to