Hi Anthony, age prediction part of this enhancement looks very similar to
https://issues.apache.org/jira/browse/TIKA-1988

Do you see any way we can collaborate on this feature? I was thinking to
build a TextFeatureParser which can parse multiple text based features like
age.

In our project for age prediction we built a classifier using linear
regression which is available through a REST API ( more details in [0] ).
We can configure multiple such REST APIs in TIKA through property file and
then let the TextFeatureParser collate and present all the results.

Let me know what you think about it. [1] has my code for TextFeatureParser,
I will be giving a PR soon.

CCing Indhu for any questions regarding [0]

[0] https://github.com/USCDataScience/Age-Predictor
[1] https://github.com/smadha/tika/tree/TIKA-1988


--
Madhav Sharan


On Thu, Jun 9, 2016 at 4:24 AM, Anthony Beylerian (JIRA) <[email protected]>
wrote:

>
>      [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2000-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Aall-2Dtabpanel&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=IuFSS42FGE2N8iHyV-KI6_797FBQzQb54vCB0SKNwEI&e=
> ]
>
> Anthony Beylerian updated TIKA-2000:
> ------------------------------------
>     Description:
> The profile parser aims to parse documents and return information about
> age and gender.
> It will integrate with an OpenNLP profiler.
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_OPENNLP-2D853&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=c5GGlGIailBKBYwZAhjeQwmsrmvHpL82hwkohDn_e20&e=
>
> Later we hope to add personality aspects such as:
> [extroverted, stable, agreeable, conscientious, open]
>
> More description can be found here :
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_beylerian_profiler&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=YwBWwK0wqGWPq_nelmSxOrNW1IEzUNYmRBD4z3UpDrg&e=
>
>   was:
> The profile parser aims to parse documents and return information about
> age and gender.
> It will integrate with an OpenNLP profiler tool.
>
> Later we hope to add personality aspects such as:
> [extroverted, stable, agreeable, conscientious, open]
>
> More description can be found here :
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_beylerian_profiler&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=YwBWwK0wqGWPq_nelmSxOrNW1IEzUNYmRBD4z3UpDrg&e=
>
>
> > Author profile parser
> > ---------------------
> >
> >                 Key: TIKA-2000
> >                 URL:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2000&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=Bm9KwOcXvzryeD3QYyySVyNZVfvYiBo7okVhB0r8C7o&e=
> >             Project: Tika
> >          Issue Type: New Feature
> >            Reporter: Anthony Beylerian
> >
> > The profile parser aims to parse documents and return information about
> age and gender.
> > It will integrate with an OpenNLP profiler.
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_OPENNLP-2D853&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=c5GGlGIailBKBYwZAhjeQwmsrmvHpL82hwkohDn_e20&e=
> > Later we hope to add personality aspects such as:
> > [extroverted, stable, agreeable, conscientious, open]
> > More description can be found here :
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_beylerian_profiler&d=DQICaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=HIMU7NLZWA3Ih13GUtVU-jBNu7K1iTBNU3isGuC_03Q&s=YwBWwK0wqGWPq_nelmSxOrNW1IEzUNYmRBD4z3UpDrg&e=
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Reply via email to