Best thread in a decade! Cheers Matt and Linas! (Linas' paper sounds cool
but I probably won't have the time to understand it either but if it is
good Linas then, as a great scientist recently told me, you just need a ton
of patience...)

On Sat, Feb 9, 2019 at 9:03 AM Linas Vepstas <[email protected]> wrote:

> Hi Matt,
>
> On Fri, Feb 8, 2019 at 4:31 PM Matt Mahoney <[email protected]>
> wrote:
>
>>
>>
>> On Tue, Feb 5, 2019, 5:23 PM Linas Vepstas <[email protected] wrote:
>>
>>>
>>> if there were an experimental results section that told us
>>>> which ones were worth pursuing.
>>>
>>>
>>> There's this:
>>>
>>>
>>> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/connector-sets-revised.pdf
>>>
>>>
>>> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/learn-lang-diary.pdf
>>>
>>
>> Yes, that is what I was looking for. I haven't read all of it but so far
>> I learned:
>>
>> 1. It is possible to learn parts of speech and a grammar from unlabeled
>> text.
>>
>
> Well, I think that is the major claim, as it is effectively something
> that's never been done before (and many people have doubts that it's even
> possible, without NN's).  It's been the cause of much teeth-gnashing.
>
>
>>
>> 2. It is possible to learn word boundaries in continuous speech by
>> finding boundaries with low mutual information. (I did similar experiments
>> to restore deleted spaces in text using only n-gram statistics. It is how
>> infants learn to segment speech at 7-10 months old, before learning any
>> words.)
>>
>
> This claim is .. well, I think it's well-explored by academia, with
> various "well-known" workable solutions, dating back a decade or two. I've
> not had a chance to explore it and try to wedge it into a grand-unified
> theory.
>
>>
>> 3. Word pairs have a Zipf distribution just like single words.
>>
>
> Uhh, yes, but no. Depends on what you are graphing. Here's an unpublished
> 2009 draft:
> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/word-pairs-2009/word-pairs.pdf
> When I recently repeated a variant of this, different techniques on a
> different dataset, I got a very clean (logarithmic) Bell curve (Gaussian).
>   Who knew?  When I brought this up on a linguistics mailing (back in 2009)
> a general consensus emerged that "if you combine two Zipfs together, you
> get a logarithmic Gaussian", and that "this is probably easy to prove", but
> neither I nor anyone else took the effort to actually prove it.  Its
> probably still a sufficiently novel result that a proof, together with
> data, is publishable. So much to do, so little time.
>
>
>> (I suspect it is also true of word triples representing grammar rules. It
>> suggests there are around 10^8 rules).
>>
>
> Well, that one also gets a "yes, but no."  Here:
>
> * De facto, link-grammar manages to be a pretty decent model of English,
> with approx 2.2K rules. (or 11K = 2.2K + 8.8K rules, depending on what is
> counted as a "rule")  (`cat 4.0.dict | grep ";" |wc` vs. `cat 4.0.dict |
> grep -o " or" |wc`)
>
> * If you brute-force accumulate statistics for disjuncts, you get approx
> 10^8 of them. (I have such datasets and can share them, if you have the
> RAM)
>
> * But those datasets contain rules connecting words, and not word-classes.
> The grammar needed to parse "I see a dog, I see a rock, I see a tree" does
> not require 3 distinct rules for dog,rock,tree; it just needs one: "I see a
> <common-noun>". If you are picky, you can have several different kinds of
> common nouns, but however picky you are, you need fewer classes than there
> are words. A typical class will contain many words. The rules connect
> word-classes, not individual words.
>
> * Automatically discovering word-classes is not terribly difficult, but is
> sufficiently confusing that it has tripped up others who have tried. That's
> why I spent a lot of time talking about "matrix factorization" in one of
> the papers. It turns out that "matrix factorization" is a form of
> "clustering", in disguise.
>
> * There are several styles of matrix factorization, one of which is
> more-or-less the same thing as  "deep learning" (!) Which is perhaps the
> most important thing I have to say on this topic.
>
> * I cannot quote off the top of my head how many non-zero weights are in
> the current "state-of-the-art" natural-language  deep-learning networks. I
> think under a million(?); depends on what exactly it is that you are trying
> to do with the NN model.
>
> * So, for the syntax of English, together with a shallow amount of
> semantics (e.g. at the level of framenet or of wordnet), I think this can
> be done with 10K to 1M rules (with LG providing a hard-lower-limit for a
> "realistic" language model, and "state-of-the-art" natural-language
> deep-learning providing a soft upper-limit.)
>
> Clearly, syntax plus shallow semantics is not at all AGI. But the point is
> that *if* we can create an automated system that can reproducibly, easily
> and regularly obtain syntax plus typical ontology-type structure (again,
> wordnet/framenet/SUMO/whatever levels-of-sophistication) ***AND*** the
> resulting structure is NOT a black box of floating-point values, but is
> queriable in a natural way ("a bird is an animal" "an animal is a living
> thing", etc. can be posed as questions with affirmative/negative answers,
> or even probabilities) ***THEN*** you have a platform on which to deploy
> research about reasoning, inference, common-sense knowledge, etc.
>
> To repeat:
> * I believe that purely-automated extraction of syntax plus shallow
> semantics, for arbitrary human written language, is well within reach, and
> requires about 10K to 1M rules (depending on how deep you
> drill/simplify/abstract)
> * That the above ruleset can have a natural knowledge-query interface to
> it, making it accessible to inferencing and other high-level algos
> * That all of the above has already been demonstrated in various distinct
> prototypes and proofs-of-concept, (in various typical NAACL-HLT journal
> articles) (of which I've personally reproduced some selected handful)
> * So .. Lets do it. Integrate & test.  Its maybe rocket-science; but it's
> not science fiction.
>
>
>> I hope this work continues. It would be interesting if it advances the
>> state of the art on my large text benchmark or the Hutter prize.
>>
>
> Thanks!
>
> I think we're at the state of the art now, although I haven't been able to
> convince Ben. Who knew that he was a natural-born skeptic?  I'm totally
> ignoring benchmarks and competitions; I simply do not have the hours in the
> day.  (Actually, zero hours of the day, right now; as of about 8-9 months
> ago, the language-learning project has been handed over to a team. They are
> still getting up to speed; they don't have any background in linguistics or
> machine learning, and so have been trying to learn both at the same time,
> and getting tangled up and stalled, as a result. Currently they are
> stumbling at the "clustering" step; have not yet begun the "ontology" step.
> One step forward, two steps back.  I'd like to get back on it, as its
> clear-as-a-bell to me, but .. time constraints.)
>
> -- Linas
>
>
>>
>> -- Matt Mahoney, [email protected]
>>
>
>
> --
> cassette tapes - analog TV - film cameras - you
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-Mc48a258d09b45721f33f4376>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-M0b16dd9e293e0279520f8fb3
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to