There is a NPS Chat Corpus that already has tagged POS word tokens and classifications that could be used as part of test set
http://faculty.nps.edu/cmartell/NPSChat.htm - Carin On Fri, Oct 16, 2015 at 12:50 PM, Matthew Taylor <[email protected]> wrote: > We don't have to use the fingerprints. Another way is to simply encode the > part of speech (POS) for each word. I'm sure that statements and questions > have different temporal POS patterns that should be recognizable. > > > --------- > Matt Taylor > OS Community Flag-Bearer > Numenta > > On Fri, Oct 16, 2015 at 9:10 AM, Richard Crowder <[email protected]> wrote: > >> My 2 cent's - This sounds similar to DeepQA, that helped IBM Watson win >> Jeopardy? >> http://researcher.watson.ibm.com/researcher/view_group.php?id=2099 >> >> On Fri, Oct 16, 2015 at 4:39 PM, cogmission (David Ray) < >> [email protected]> wrote: >> >>> Awesome Idea! I for one am in! >>> >>> I think there are some questions that arise concerning capability and >>> approach? >>> >>> My main question is: >>> >>> Considering that training a Cortical.io Fingerprint will organize SDRs >>> according to subject applicability, I'm not sure whether it will >>> differentiate according to degree of interrogative-ness? I have the same >>> question as to the HTM; whether predictions and anomalies can differentiate >>> according to degree of interrogative-ness... >>> >>> So my immediate suggestion for a solution to the above is to do it in >>> the "Encoder". That is, to spatially aggregate inputs (sentences) according >>> to their Part-Of-Speach question word order... For example: >>> >>> 1. Sentences beginning with Is, Are, Why, How, Do, What, Where, Whether >>> etc. should be encoded closer to each other... >>> 2. Sentence fragments and clauses which accomplish the same as the >>> above, should have the same encoding nature. >>> >>> That's all I have for now... >>> >>> On Fri, Oct 16, 2015 at 10:23 AM, Matthew Taylor <[email protected]> >>> wrote: >>> >>>> Hello NuPIC, >>>> >>>> Here is a question for anyone interested in NLP, Cortical.IO's API, and >>>> phrase classification... >>>> >>>> This tweet from Carin Meier got me thinking last night: >>>> https://twitter.com/gigasquid/status/654802085335068672 >>>> >>>> Could we do this with text fingerprints from Cortical and HTM? What if >>>> we put together a collection of human-gathered "statements" and a list of >>>> "questions". For each phrase, we turned each word into an SDR via >>>> Cortical's API, and train one model on the statement phrases (resetting >>>> sequences between phrases) and one for questions. So we'll have one model >>>> that's only seen statements and one that's only seen phrases. >>>> >>>> If there are typical word patterns that exist mostly in one type of >>>> phrase or another, it may be possible to feed new phrases as SDRs into each >>>> model, and use the lowest anomaly to identify whether it is a statement or >>>> question? >>>> >>>> Does this seem feasible? Is anyone interested in this project? >>>> >>>> Thanks, >>>> >>>> --------- >>>> Matt Taylor >>>> OS Community Flag-Bearer >>>> Numenta >>>> >>> >>> >>> >>> -- >>> *With kind regards,* >>> >>> David Ray >>> Java Solutions Architect >>> >>> *Cortical.io <http://cortical.io/>* >>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>> >>> [email protected] >>> http://cortical.io >>> >> >> >
