I have played around with Stanford CoreNLP http://nlp.stanford.edu/software/corenlp.shtml. It is supposed to be a higher quality than NLTK https://github.com/gigasquid/stanford-talk
On Fri, Oct 16, 2015 at 1:25 PM, Matthew Taylor <[email protected]> wrote: > I have used NLTK in python before to do POS tagging, but honestly it is > not very good. > > > --------- > Matt Taylor > OS Community Flag-Bearer > Numenta > > On Fri, Oct 16, 2015 at 10:19 AM, cogmission (David Ray) < > [email protected]> wrote: > >> @Carin between those two resources we should be able to come up with an >> adequate word "look up" mechanism eh? >> >> On Fri, Oct 16, 2015 at 12:12 PM, cogmission (David Ray) < >> [email protected]> wrote: >> >>> Here's a resource: The Moby Part of Speech file!!! >>> >>> Linked on my server: www.mindlab.ai/mobypos.txt >>> >>> That's one resource! >>> >>> On Fri, Oct 16, 2015 at 12:05 PM, cogmission (David Ray) < >>> [email protected]> wrote: >>> >>>> Yep, precisely. Do it in the encoder! The encoder would take in a whole >>>> sentence and encode each word according to its "position" within a >>>> sentence, and its POS. For instance: The word "Where" would be encoded >>>> differently depending on the what its location in the sentence is... >>>> >>>> >>>> >>>> On Fri, Oct 16, 2015 at 11:50 AM, Matthew Taylor <[email protected]> >>>> wrote: >>>> >>>>> We don't have to use the fingerprints. Another way is to simply encode >>>>> the part of speech (POS) for each word. I'm sure that statements and >>>>> questions have different temporal POS patterns that should be >>>>> recognizable. >>>>> >>>>> >>>>> --------- >>>>> Matt Taylor >>>>> OS Community Flag-Bearer >>>>> Numenta >>>>> >>>>> On Fri, Oct 16, 2015 at 9:10 AM, Richard Crowder <[email protected]> >>>>> wrote: >>>>> >>>>>> My 2 cent's - This sounds similar to DeepQA, that helped IBM Watson >>>>>> win Jeopardy? >>>>>> http://researcher.watson.ibm.com/researcher/view_group.php?id=2099 >>>>>> >>>>>> On Fri, Oct 16, 2015 at 4:39 PM, cogmission (David Ray) < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Awesome Idea! I for one am in! >>>>>>> >>>>>>> I think there are some questions that arise concerning capability >>>>>>> and approach? >>>>>>> >>>>>>> My main question is: >>>>>>> >>>>>>> Considering that training a Cortical.io Fingerprint will organize >>>>>>> SDRs according to subject applicability, I'm not sure whether it will >>>>>>> differentiate according to degree of interrogative-ness? I have the same >>>>>>> question as to the HTM; whether predictions and anomalies can >>>>>>> differentiate >>>>>>> according to degree of interrogative-ness... >>>>>>> >>>>>>> So my immediate suggestion for a solution to the above is to do it >>>>>>> in the "Encoder". That is, to spatially aggregate inputs (sentences) >>>>>>> according to their Part-Of-Speach question word order... For example: >>>>>>> >>>>>>> 1. Sentences beginning with Is, Are, Why, How, Do, What, Where, >>>>>>> Whether etc. should be encoded closer to each other... >>>>>>> 2. Sentence fragments and clauses which accomplish the same as the >>>>>>> above, should have the same encoding nature. >>>>>>> >>>>>>> That's all I have for now... >>>>>>> >>>>>>> On Fri, Oct 16, 2015 at 10:23 AM, Matthew Taylor <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello NuPIC, >>>>>>>> >>>>>>>> Here is a question for anyone interested in NLP, Cortical.IO's API, >>>>>>>> and phrase classification... >>>>>>>> >>>>>>>> This tweet from Carin Meier got me thinking last night: >>>>>>>> https://twitter.com/gigasquid/status/654802085335068672 >>>>>>>> >>>>>>>> Could we do this with text fingerprints from Cortical and HTM? What >>>>>>>> if we put together a collection of human-gathered "statements" and a >>>>>>>> list >>>>>>>> of "questions". For each phrase, we turned each word into an SDR via >>>>>>>> Cortical's API, and train one model on the statement phrases (resetting >>>>>>>> sequences between phrases) and one for questions. So we'll have one >>>>>>>> model >>>>>>>> that's only seen statements and one that's only seen phrases. >>>>>>>> >>>>>>>> If there are typical word patterns that exist mostly in one type of >>>>>>>> phrase or another, it may be possible to feed new phrases as SDRs into >>>>>>>> each >>>>>>>> model, and use the lowest anomaly to identify whether it is a >>>>>>>> statement or >>>>>>>> question? >>>>>>>> >>>>>>>> Does this seem feasible? Is anyone interested in this project? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> --------- >>>>>>>> Matt Taylor >>>>>>>> OS Community Flag-Bearer >>>>>>>> Numenta >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *With kind regards,* >>>>>>> >>>>>>> David Ray >>>>>>> Java Solutions Architect >>>>>>> >>>>>>> *Cortical.io <http://cortical.io/>* >>>>>>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>>>>>> >>>>>>> [email protected] >>>>>>> http://cortical.io >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *With kind regards,* >>>> >>>> David Ray >>>> Java Solutions Architect >>>> >>>> *Cortical.io <http://cortical.io/>* >>>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>>> >>>> [email protected] >>>> http://cortical.io >>>> >>> >>> >>> >>> -- >>> *With kind regards,* >>> >>> David Ray >>> Java Solutions Architect >>> >>> *Cortical.io <http://cortical.io/>* >>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>> >>> [email protected] >>> http://cortical.io >>> >> >> >> >> -- >> *With kind regards,* >> >> David Ray >> Java Solutions Architect >> >> *Cortical.io <http://cortical.io/>* >> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >> >> [email protected] >> http://cortical.io >> > >
