We would love to have this part of Apache Tika. You can take a look at the existing NER/NLP stuff integrated like in GeoTopicParser as an example and yes please file a JIRA issue:
http://issues.apache.org/jira/browse/TIKA I would be happy to work with you to make it happen. See: http://github.com/apache/tika/#contributing-via-github For guidance. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ On 6/7/16, 9:36 AM, "Anthony Beylerian" <[email protected]> wrote: >Hello, > >We are currently working on an experimental author profiler that we think >could be added to the toolkit. > >The profiler aims to detect the gender and age range of an author. >Later we hope to add personality aspects such as: >[extroverted, stable, agreeable, conscientious] > >We would like the teams' opinion on the matter. >An initial code drop can be found here[1] if someone is willing to >contribute/collaborate on it with us please let us know. > >Thanks! > >[1] https://github.com/beylerian/profiler
