Another option I was using for gathering my data for statement/question
classification was to scrape answers.com http://www.answers.com/Q/FAQ/4571


- Carin


On Fri, Oct 16, 2015 at 1:00 PM, Carin Meier <[email protected]> wrote:

> There is a NPS Chat Corpus that already has tagged POS word tokens and
> classifications that could be used as part of test set
>
> http://faculty.nps.edu/cmartell/NPSChat.htm
>
> - Carin
>
> On Fri, Oct 16, 2015 at 12:50 PM, Matthew Taylor <[email protected]> wrote:
>
>> We don't have to use the fingerprints. Another way is to simply encode
>> the part of speech (POS) for each word. I'm sure that statements and
>> questions have different temporal POS patterns that should be recognizable.
>>
>>
>> ---------
>> Matt Taylor
>> OS Community Flag-Bearer
>> Numenta
>>
>> On Fri, Oct 16, 2015 at 9:10 AM, Richard Crowder <[email protected]>
>> wrote:
>>
>>> My 2 cent's - This sounds similar to DeepQA, that helped IBM Watson win
>>> Jeopardy?
>>> http://researcher.watson.ibm.com/researcher/view_group.php?id=2099
>>>
>>> On Fri, Oct 16, 2015 at 4:39 PM, cogmission (David Ray) <
>>> [email protected]> wrote:
>>>
>>>> Awesome Idea! I for one am in!
>>>>
>>>> I think there are some questions that arise concerning capability and
>>>> approach?
>>>>
>>>> My main question is:
>>>>
>>>> Considering that training a Cortical.io Fingerprint will organize SDRs
>>>> according to subject applicability, I'm not sure whether it will
>>>> differentiate according to degree of interrogative-ness? I have the same
>>>> question as to the HTM; whether predictions and anomalies can differentiate
>>>> according to degree of interrogative-ness...
>>>>
>>>> So my immediate suggestion for a solution to the above is to do it in
>>>> the "Encoder". That is, to spatially aggregate inputs (sentences) according
>>>> to their Part-Of-Speach question word order... For example:
>>>>
>>>> 1. Sentences beginning with Is, Are, Why, How, Do, What, Where, Whether
>>>> etc. should be encoded closer to each other...
>>>> 2. Sentence fragments and clauses which accomplish the same as the
>>>> above, should have the same encoding nature.
>>>>
>>>> That's all I have for now...
>>>>
>>>> On Fri, Oct 16, 2015 at 10:23 AM, Matthew Taylor <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello NuPIC,
>>>>>
>>>>> Here is a question for anyone interested in NLP, Cortical.IO's API,
>>>>> and phrase classification...
>>>>>
>>>>> This tweet from Carin Meier got me thinking last night:
>>>>> https://twitter.com/gigasquid/status/654802085335068672
>>>>>
>>>>> Could we do this with text fingerprints from Cortical and HTM? What if
>>>>> we put together a collection of human-gathered "statements" and a list of
>>>>> "questions". For each phrase, we turned each word into an SDR via
>>>>> Cortical's API, and train one model on the statement phrases (resetting
>>>>> sequences between phrases) and one for questions. So we'll have one model
>>>>> that's only seen statements and one that's only seen phrases.
>>>>>
>>>>> If there are typical word patterns that exist mostly in one type of
>>>>> phrase or another, it may be possible to feed new phrases as SDRs into 
>>>>> each
>>>>> model, and use the lowest anomaly to identify whether it is a statement or
>>>>> question?
>>>>>
>>>>> Does this seem feasible? Is anyone interested in this project?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> ---------
>>>>> Matt Taylor
>>>>> OS Community Flag-Bearer
>>>>> Numenta
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *With kind regards,*
>>>>
>>>> David Ray
>>>> Java Solutions Architect
>>>>
>>>> *Cortical.io <http://cortical.io/>*
>>>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>>>
>>>> [email protected]
>>>> http://cortical.io
>>>>
>>>
>>>
>>
>

Reply via email to