Another option I was using for gathering my data for statement/question classification was to scrape answers.com http://www.answers.com/Q/FAQ/4571
- Carin On Fri, Oct 16, 2015 at 1:00 PM, Carin Meier <[email protected]> wrote: > There is a NPS Chat Corpus that already has tagged POS word tokens and > classifications that could be used as part of test set > > http://faculty.nps.edu/cmartell/NPSChat.htm > > - Carin > > On Fri, Oct 16, 2015 at 12:50 PM, Matthew Taylor <[email protected]> wrote: > >> We don't have to use the fingerprints. Another way is to simply encode >> the part of speech (POS) for each word. I'm sure that statements and >> questions have different temporal POS patterns that should be recognizable. >> >> >> --------- >> Matt Taylor >> OS Community Flag-Bearer >> Numenta >> >> On Fri, Oct 16, 2015 at 9:10 AM, Richard Crowder <[email protected]> >> wrote: >> >>> My 2 cent's - This sounds similar to DeepQA, that helped IBM Watson win >>> Jeopardy? >>> http://researcher.watson.ibm.com/researcher/view_group.php?id=2099 >>> >>> On Fri, Oct 16, 2015 at 4:39 PM, cogmission (David Ray) < >>> [email protected]> wrote: >>> >>>> Awesome Idea! I for one am in! >>>> >>>> I think there are some questions that arise concerning capability and >>>> approach? >>>> >>>> My main question is: >>>> >>>> Considering that training a Cortical.io Fingerprint will organize SDRs >>>> according to subject applicability, I'm not sure whether it will >>>> differentiate according to degree of interrogative-ness? I have the same >>>> question as to the HTM; whether predictions and anomalies can differentiate >>>> according to degree of interrogative-ness... >>>> >>>> So my immediate suggestion for a solution to the above is to do it in >>>> the "Encoder". That is, to spatially aggregate inputs (sentences) according >>>> to their Part-Of-Speach question word order... For example: >>>> >>>> 1. Sentences beginning with Is, Are, Why, How, Do, What, Where, Whether >>>> etc. should be encoded closer to each other... >>>> 2. Sentence fragments and clauses which accomplish the same as the >>>> above, should have the same encoding nature. >>>> >>>> That's all I have for now... >>>> >>>> On Fri, Oct 16, 2015 at 10:23 AM, Matthew Taylor <[email protected]> >>>> wrote: >>>> >>>>> Hello NuPIC, >>>>> >>>>> Here is a question for anyone interested in NLP, Cortical.IO's API, >>>>> and phrase classification... >>>>> >>>>> This tweet from Carin Meier got me thinking last night: >>>>> https://twitter.com/gigasquid/status/654802085335068672 >>>>> >>>>> Could we do this with text fingerprints from Cortical and HTM? What if >>>>> we put together a collection of human-gathered "statements" and a list of >>>>> "questions". For each phrase, we turned each word into an SDR via >>>>> Cortical's API, and train one model on the statement phrases (resetting >>>>> sequences between phrases) and one for questions. So we'll have one model >>>>> that's only seen statements and one that's only seen phrases. >>>>> >>>>> If there are typical word patterns that exist mostly in one type of >>>>> phrase or another, it may be possible to feed new phrases as SDRs into >>>>> each >>>>> model, and use the lowest anomaly to identify whether it is a statement or >>>>> question? >>>>> >>>>> Does this seem feasible? Is anyone interested in this project? >>>>> >>>>> Thanks, >>>>> >>>>> --------- >>>>> Matt Taylor >>>>> OS Community Flag-Bearer >>>>> Numenta >>>>> >>>> >>>> >>>> >>>> -- >>>> *With kind regards,* >>>> >>>> David Ray >>>> Java Solutions Architect >>>> >>>> *Cortical.io <http://cortical.io/>* >>>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>>> >>>> [email protected] >>>> http://cortical.io >>>> >>> >>> >> >
