Yeah agreed I saw your project and I liked the way you created binary and
quad age groups. *Indhu* can share more details on linear regression
approach and accuracy. As far as I know it's a bigram model based on top
10k features

This is how Tika CLI response looks like -

Content-Length: 6954
Content-Type: application/xml
*Estimated-Author-Age: 23*
*Estimated-Author-Age-Range: 18-28*
X-Parsed-By: org.apache.tika.parser.CompositeParser
X-Parsed-By: org.apache.tika.parser.nlp.classifier.TextFeatureParser
resourceName: pom.xml

I was thinking to add more meta data fields from different approaches in
same response. For example we can add a new field
"*Estimated-Author-Age-Binary-Group"
*to this. We can run multiple REST API call in parallel and enable/disable
through property file. Basically let user define what all API it wants to
run and we can club all the results together through TIKA.

Thanks

--
Madhav Sharan


On Tue, Jun 14, 2016 at 12:51 AM, Anthony Beylerian <
[email protected]> wrote:

> Hi Madhav,
>
> Thank you for sharing, yes maybe it's possible.
>
> Although there is overlap, the two approaches are a bit different.
>
> Do you have some documentation on the performance of the linear regression
> approach?
>
> I'm not sure how well it would perform for gender (binary) and other
> attributes.
>
> Ideally it would be desirable to have a way to capture all traits with
> reasonable performance.
>
> Best,
>
> Anthony
>
>
> On Tue, Jun 14, 2016 at 8:46 AM, Madhav Sharan <[email protected]> wrote:
>
>> Hi Anthony, age prediction part of this enhancement looks very similar to
>> https://issues.apache.org/jira/browse/TIKA-1988
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D1988&d=DQMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=9RmoO3IABeowGsd4to3rmAsNGyj0_JZvKV652Y5Vglw&s=nKX9E7Bx4P7K2XTDx09XhgeiiOMPspDmT0Adk7GIPfg&e=>
>>
>> Do you see any way we can collaborate on this feature? I was thinking to
>> build a TextFeatureParser which can parse multiple text based features
>> like
>> age.
>>
>> In our project for age prediction we built a classifier using linear
>> regression which is available through a REST API ( more details in [0] ).
>> We can configure multiple such REST APIs in TIKA through property file and
>> then let the TextFeatureParser collate and present all the results.
>>
>> Let me know what you think about it. [1] has my code for
>> TextFeatureParser,
>> I will be giving a PR soon.
>>
>> CCing Indhu for any questions regarding [0]
>>
>> [0] https://github.com/USCDataScience/Age-Predictor
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_USCDataScience_Age-2DPredictor&d=DQMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=9RmoO3IABeowGsd4to3rmAsNGyj0_JZvKV652Y5Vglw&s=xd4ervXX_i0ZIpOSFgj80D563gcu8x3Vr1EVCE4f_g0&e=>
>> [1] https://github.com/smadha/tika/tree/TIKA-1988
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_smadha_tika_tree_TIKA-2D1988&d=DQMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=9RmoO3IABeowGsd4to3rmAsNGyj0_JZvKV652Y5Vglw&s=qYjX6OCUXpDmX8074vxKXpcuF6-ckVuWorr4135QBlw&e=>
>>
>>
>> --
>> Madhav Sharan
>>
>
>

Reply via email to