Hi Anthony, age prediction part of this enhancement looks very similar to https://issues.apache.org/jira/browse/TIKA-1988
Do you see any way we can collaborate on this feature? I was thinking to build a TextFeatureParser which can parse multiple text based features like age. In our project for age prediction we built a classifier using linear regression which is available through a REST API ( more details in [0] ). We can configure multiple such REST APIs in TIKA through property file and then let the TextFeatureParser collate and present all the results. Let me know what you think about it. [1] has my code for TextFeatureParser, I will be giving a PR soon. CCing Indhu for any questions regarding [0] [0] https://github.com/USCDataScience/Age-Predictor [1] https://github.com/smadha/tika/tree/TIKA-1988 -- Madhav Sharan On Thu, Jun 9, 2016 at 4:24 AM, Anthony Beylerian (JIRA) <[email protected]> wrote: > > [ > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2000-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Aall-2Dtabpanel&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=IuFSS42FGE2N8iHyV-KI6_797FBQzQb54vCB0SKNwEI&e= > ] > > Anthony Beylerian updated TIKA-2000: > ------------------------------------ > Description: > The profile parser aims to parse documents and return information about > age and gender. > It will integrate with an OpenNLP profiler. > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_OPENNLP-2D853&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=c5GGlGIailBKBYwZAhjeQwmsrmvHpL82hwkohDn_e20&e= > > Later we hope to add personality aspects such as: > [extroverted, stable, agreeable, conscientious, open] > > More description can be found here : > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_beylerian_profiler&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=YwBWwK0wqGWPq_nelmSxOrNW1IEzUNYmRBD4z3UpDrg&e= > > was: > The profile parser aims to parse documents and return information about > age and gender. > It will integrate with an OpenNLP profiler tool. > > Later we hope to add personality aspects such as: > [extroverted, stable, agreeable, conscientious, open] > > More description can be found here : > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_beylerian_profiler&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=YwBWwK0wqGWPq_nelmSxOrNW1IEzUNYmRBD4z3UpDrg&e= > > > > Author profile parser > > --------------------- > > > > Key: TIKA-2000 > > URL: > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2000&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=Bm9KwOcXvzryeD3QYyySVyNZVfvYiBo7okVhB0r8C7o&e= > > Project: Tika > > Issue Type: New Feature > > Reporter: Anthony Beylerian > > > > The profile parser aims to parse documents and return information about > age and gender. > > It will integrate with an OpenNLP profiler. > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_OPENNLP-2D853&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=c5GGlGIailBKBYwZAhjeQwmsrmvHpL82hwkohDn_e20&e= > > Later we hope to add personality aspects such as: > > [extroverted, stable, agreeable, conscientious, open] > > More description can be found here : > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_beylerian_profiler&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=YwBWwK0wqGWPq_nelmSxOrNW1IEzUNYmRBD4z3UpDrg&e= > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >
