Re: NLP idea: identify statements vs questions

Carin Meier Fri, 16 Oct 2015 12:01:31 -0700

I have played around with Stanford CoreNLP
http://nlp.stanford.edu/software/corenlp.shtml.  It is supposed to be a
higher quality than NLTK https://github.com/gigasquid/stanford-talk


On Fri, Oct 16, 2015 at 1:25 PM, Matthew Taylor <[email protected]> wrote:

> I have used NLTK in python before to do POS tagging, but honestly it is
> not very good.
>
>
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
>
> On Fri, Oct 16, 2015 at 10:19 AM, cogmission (David Ray) <
> [email protected]> wrote:
>
>> @Carin between those two resources we should be able to come up with an
>> adequate word "look up" mechanism eh?
>>
>> On Fri, Oct 16, 2015 at 12:12 PM, cogmission (David Ray) <
>> [email protected]> wrote:
>>
>>> Here's a resource: The Moby Part of Speech file!!!
>>>
>>> Linked on my server: www.mindlab.ai/mobypos.txt
>>>
>>> That's one resource!
>>>
>>> On Fri, Oct 16, 2015 at 12:05 PM, cogmission (David Ray) <
>>> [email protected]> wrote:
>>>
>>>> Yep, precisely. Do it in the encoder! The encoder would take in a whole
>>>> sentence and encode each word according to its "position" within a
>>>> sentence, and its POS. For instance: The word "Where" would be encoded
>>>> differently depending on the what its location in the sentence is...
>>>>
>>>>
>>>>
>>>> On Fri, Oct 16, 2015 at 11:50 AM, Matthew Taylor <[email protected]>
>>>> wrote:
>>>>
>>>>> We don't have to use the fingerprints. Another way is to simply encode
>>>>> the part of speech (POS) for each word. I'm sure that statements and
>>>>> questions have different temporal POS patterns that should be 
>>>>> recognizable.
>>>>>
>>>>>
>>>>> ---------
>>>>> Matt Taylor
>>>>> OS Community Flag-Bearer
>>>>> Numenta
>>>>>
>>>>> On Fri, Oct 16, 2015 at 9:10 AM, Richard Crowder <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> My 2 cent's - This sounds similar to DeepQA, that helped IBM Watson
>>>>>> win Jeopardy?
>>>>>> http://researcher.watson.ibm.com/researcher/view_group.php?id=2099
>>>>>>
>>>>>> On Fri, Oct 16, 2015 at 4:39 PM, cogmission (David Ray) <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Awesome Idea! I for one am in!
>>>>>>>
>>>>>>> I think there are some questions that arise concerning capability
>>>>>>> and approach?
>>>>>>>
>>>>>>> My main question is:
>>>>>>>
>>>>>>> Considering that training a Cortical.io Fingerprint will organize
>>>>>>> SDRs according to subject applicability, I'm not sure whether it will
>>>>>>> differentiate according to degree of interrogative-ness? I have the same
>>>>>>> question as to the HTM; whether predictions and anomalies can 
>>>>>>> differentiate
>>>>>>> according to degree of interrogative-ness...
>>>>>>>
>>>>>>> So my immediate suggestion for a solution to the above is to do it
>>>>>>> in the "Encoder". That is, to spatially aggregate inputs (sentences)
>>>>>>> according to their Part-Of-Speach question word order... For example:
>>>>>>>
>>>>>>> 1. Sentences beginning with Is, Are, Why, How, Do, What, Where,
>>>>>>> Whether etc. should be encoded closer to each other...
>>>>>>> 2. Sentence fragments and clauses which accomplish the same as the
>>>>>>> above, should have the same encoding nature.
>>>>>>>
>>>>>>> That's all I have for now...
>>>>>>>
>>>>>>> On Fri, Oct 16, 2015 at 10:23 AM, Matthew Taylor <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello NuPIC,
>>>>>>>>
>>>>>>>> Here is a question for anyone interested in NLP, Cortical.IO's API,
>>>>>>>> and phrase classification...
>>>>>>>>
>>>>>>>> This tweet from Carin Meier got me thinking last night:
>>>>>>>> https://twitter.com/gigasquid/status/654802085335068672
>>>>>>>>
>>>>>>>> Could we do this with text fingerprints from Cortical and HTM? What
>>>>>>>> if we put together a collection of human-gathered "statements" and a 
>>>>>>>> list
>>>>>>>> of "questions". For each phrase, we turned each word into an SDR via
>>>>>>>> Cortical's API, and train one model on the statement phrases (resetting
>>>>>>>> sequences between phrases) and one for questions. So we'll have one 
>>>>>>>> model
>>>>>>>> that's only seen statements and one that's only seen phrases.
>>>>>>>>
>>>>>>>> If there are typical word patterns that exist mostly in one type of
>>>>>>>> phrase or another, it may be possible to feed new phrases as SDRs into 
>>>>>>>> each
>>>>>>>> model, and use the lowest anomaly to identify whether it is a 
>>>>>>>> statement or
>>>>>>>> question?
>>>>>>>>
>>>>>>>> Does this seem feasible? Is anyone interested in this project?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> ---------
>>>>>>>> Matt Taylor
>>>>>>>> OS Community Flag-Bearer
>>>>>>>> Numenta
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *With kind regards,*
>>>>>>>
>>>>>>> David Ray
>>>>>>> Java Solutions Architect
>>>>>>>
>>>>>>> *Cortical.io <http://cortical.io/>*
>>>>>>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>>>>>>
>>>>>>> [email protected]
>>>>>>> http://cortical.io
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *With kind regards,*
>>>>
>>>> David Ray
>>>> Java Solutions Architect
>>>>
>>>> *Cortical.io <http://cortical.io/>*
>>>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>>>
>>>> [email protected]
>>>> http://cortical.io
>>>>
>>>
>>>
>>>
>>> --
>>> *With kind regards,*
>>>
>>> David Ray
>>> Java Solutions Architect
>>>
>>> *Cortical.io <http://cortical.io/>*
>>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>>
>>> [email protected]
>>> http://cortical.io
>>>
>>
>>
>>
>> --
>> *With kind regards,*
>>
>> David Ray
>> Java Solutions Architect
>>
>> *Cortical.io <http://cortical.io/>*
>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>
>> [email protected]
>> http://cortical.io
>>
>
>

Re: NLP idea: identify statements vs questions

Reply via email to