There is a NPS Chat Corpus that already has tagged POS word tokens and
classifications that could be used as part of test set

http://faculty.nps.edu/cmartell/NPSChat.htm

- Carin

On Fri, Oct 16, 2015 at 12:50 PM, Matthew Taylor <[email protected]> wrote:

> We don't have to use the fingerprints. Another way is to simply encode the
> part of speech (POS) for each word. I'm sure that statements and questions
> have different temporal POS patterns that should be recognizable.
>
>
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
>
> On Fri, Oct 16, 2015 at 9:10 AM, Richard Crowder <[email protected]> wrote:
>
>> My 2 cent's - This sounds similar to DeepQA, that helped IBM Watson win
>> Jeopardy?
>> http://researcher.watson.ibm.com/researcher/view_group.php?id=2099
>>
>> On Fri, Oct 16, 2015 at 4:39 PM, cogmission (David Ray) <
>> [email protected]> wrote:
>>
>>> Awesome Idea! I for one am in!
>>>
>>> I think there are some questions that arise concerning capability and
>>> approach?
>>>
>>> My main question is:
>>>
>>> Considering that training a Cortical.io Fingerprint will organize SDRs
>>> according to subject applicability, I'm not sure whether it will
>>> differentiate according to degree of interrogative-ness? I have the same
>>> question as to the HTM; whether predictions and anomalies can differentiate
>>> according to degree of interrogative-ness...
>>>
>>> So my immediate suggestion for a solution to the above is to do it in
>>> the "Encoder". That is, to spatially aggregate inputs (sentences) according
>>> to their Part-Of-Speach question word order... For example:
>>>
>>> 1. Sentences beginning with Is, Are, Why, How, Do, What, Where, Whether
>>> etc. should be encoded closer to each other...
>>> 2. Sentence fragments and clauses which accomplish the same as the
>>> above, should have the same encoding nature.
>>>
>>> That's all I have for now...
>>>
>>> On Fri, Oct 16, 2015 at 10:23 AM, Matthew Taylor <[email protected]>
>>> wrote:
>>>
>>>> Hello NuPIC,
>>>>
>>>> Here is a question for anyone interested in NLP, Cortical.IO's API, and
>>>> phrase classification...
>>>>
>>>> This tweet from Carin Meier got me thinking last night:
>>>> https://twitter.com/gigasquid/status/654802085335068672
>>>>
>>>> Could we do this with text fingerprints from Cortical and HTM? What if
>>>> we put together a collection of human-gathered "statements" and a list of
>>>> "questions". For each phrase, we turned each word into an SDR via
>>>> Cortical's API, and train one model on the statement phrases (resetting
>>>> sequences between phrases) and one for questions. So we'll have one model
>>>> that's only seen statements and one that's only seen phrases.
>>>>
>>>> If there are typical word patterns that exist mostly in one type of
>>>> phrase or another, it may be possible to feed new phrases as SDRs into each
>>>> model, and use the lowest anomaly to identify whether it is a statement or
>>>> question?
>>>>
>>>> Does this seem feasible? Is anyone interested in this project?
>>>>
>>>> Thanks,
>>>>
>>>> ---------
>>>> Matt Taylor
>>>> OS Community Flag-Bearer
>>>> Numenta
>>>>
>>>
>>>
>>>
>>> --
>>> *With kind regards,*
>>>
>>> David Ray
>>> Java Solutions Architect
>>>
>>> *Cortical.io <http://cortical.io/>*
>>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>>
>>> [email protected]
>>> http://cortical.io
>>>
>>
>>
>

Reply via email to