Re: Cortical.io: Semantic Folding White Paper is online

Carsten Schnober Fri, 27 Nov 2015 00:22:29 -0800

Dear list,
Thanks for the good read. I am happy to (hopefully) start the discussion.

The first issue that comes into my mind is syntax: this concerns every
bag-of-words approach, including Cortical's.
The most obvious source of "natural language misunderstanding" in this
context is negation, as easily demonstrated in this example:

- fox eat rodent
- sheep do not eat rodent

I suppose the presented algorithm would learn from this that both fox
and sheep do eat rodent, doesn't it? This is probably more harmful when
classifying sentences than during learning, because a corpus of
reasonable size presumably contains a sufficient amount of examples that
are phrased in a more straight-forward way.
More complex examples, including more subtle negation, relative clauses
etc., will pose much larger challenges.

I am quite sure that the syntax issue has been discussed in this
context. However, I couldn't find any references, neither in the
theoretical nor in the practical part of the whitepaper. I am very
interested in Cortical.io's experiences with that problem and what
possible (future) solutions might look like.

In statistical NLP, this issue has been tackled (more or less
successfully) with methods such as recurrent neural networks or by using
sliding windows across multiple words. amongst others. Neither of these
approaches seem applicable here without taking away a fundamental and
very handy property of the SDRs: that they can be efficiently aggregated
with boolean operations.

Although the syntax issue might be almost irrelevant for many practical
use cases such as document classification, I think it raises an
interesting theoretical question. How does the human brain process
syntax and, more interestingly, how can this be incorporated into the
presented theory?

A slightly more technical issue I've stumbled across is word inflection.
The whitepaper briefly mentions morphemes which are, according to
linguistic theory, "the smallest meaningful units" in language. I
understand that working on the word level is sufficient in most cases
and much easier for practical reasons (tokenization is relatively easy).
I wonder how this is handled in practice though, for instance when
learning a new "language definition corpus". Are the words automatically
lemmatized? What if a new language is learned for which no lemmatizers
are available? Is mere stemming applied in that case? What happens if
different word forms do express a different meaning?

Thanks for any input on these issues!
Carsten

Am 25.11.2015 um 19:13 schrieb Fergal Byrne:
> Nice, Francisco, thanks for letting us know. I've read the paper, very
> well put together. Looking forward to discussions and questions on the list.
> 
> --
> 
> Fergal Byrne, Brenter IT
> 
> Author, Real Machine Intelligence with Clortex and NuPIC
> https://leanpub.com/realsmartmachines
> 
> Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014:
> http://euroclojure.com/2014/
> and at LambdaJam Chicago, July 2014: http://www.lambdajam.com
> 
> http://inbits.com - Better Living through Thoughtful Technology
> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
> 
> e:[email protected] t:+353 83 4214179
> Join the quest for Machine Intelligence at http://numenta.org
> Formerly of Adnet [email protected] http://www.adnet.ie
> 
> 
> On Wed, Nov 25, 2015 at 4:33 PM, Chandan Maruthi
> <[email protected] <mailto:[email protected]>> wrote:
> 
>     Francisco
> 
>     This is great , looking forward to read this today
> 
>     On Wednesday, November 25, 2015, cogmission (David Ray)
>     <[email protected] <mailto:[email protected]>> wrote:
> 
>         Hi Francisco,
> 
>         This will make for a very interesting and informative read!
>         Can't wait!
> 
>         Cheers,
>         David
> 
>         On Wed, Nov 25, 2015 at 8:38 AM, Pascal Weinberger
>         <[email protected]> wrote:
> 
>             Great! 
>             I was waiting for this a long time :D 
>             Will make my day! :)
> 
>             Thank you!
> 
> 
> 
>             Best,
> 
>             Pascal Weinberger 
> 
>             ____________________________
> 
>             BE THE CHANGE YOU WANT TO SEE IN THE WORLD ...
> 
> 
>             PLEASE NOTE:  This email and any file transmitted are
>             strictly confidential and/or legally privileged and intended
>             only for the person(s) directly addressed. If you are not
>             the intended recipient, any use, copying, transmission,
>             distribution, or other forms of dissemination is strictly
>             prohibited. If you have received this email in error, please
>             notify the sender immediately and permanently delete the
>             email and files, if any.____
> 
>             Please consider the environment before printing this message.
> 
>             __ __
> 
> 
> 
>             On 25 Nov 2015, at 14:49, Francisco Webber
>             <[email protected]> wrote:
> 
>>             Hello all,
>>             For everyone interested in the theoretical background to
>>             Cortical.io <http://cortical.io>’s technology:
>>
>>             The Semantic Folding white paper is out in its first
>>             incarnation:
>>
>>             Download full White Paper
>>             
>> <http://www.cortical.io/static/downloads/semantic-folding-theory-white-paper.pdf>
>>
>>             All the Best
>>
>>             Francisco
>>
>>
> 
> 
> 
>         -- 
>         /With kind regards,/
>          
>         David Ray
>         Java Solutions Architect
>          
>         *Cortical.io <http://cortical.io/>*
>         Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>          
>         [email protected]
>         http://cortical.io <http://cortical.io/>
> 
> 
> 
>     -- 
>     Regards
>     Chandan Maruthi
> 
> 
> 

-- 
Carsten Schnober
Doctoral Researcher
Ubiquitous Knowledge Processing (UKP) Lab
FB 20 / Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
[email protected]
www.ukp.tu-darmstadt.de

Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources
(AIPHES): www.aiphes.tu-darmstadt.de
PhD program: Knowledge Discovery in Scientific Literature (KDSL)
www.kdsl.tu-darmstadt.de

Re: Cortical.io: Semantic Folding White Paper is online

Reply via email to