Takuya Kitazawa created HIVEMALL-142:
----------------------------------------

             Summary: Implement SingularizeUDF for English singular-ization
                 Key: HIVEMALL-142
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-142
             Project: Hivemall
          Issue Type: New Feature
            Reporter: Takuya Kitazawa
            Assignee: Takuya Kitazawa


Something like `singularize('movies')` => `'movie'` could be very useful in a 
combination of `tokenize()` for English NLP on Hivemall. 

Implementation  mostly relies on regexp as:

* Jave example: 
https://github.com/sundrio/sundrio/blob/master/codegen/src/main/java/io/sundr/codegen/functions/Singularize.java
* One of the most famous Python implementation 
https://github.com/clips/pattern/blob/master/pattern/text/en/inflect.py#L445-L623



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to