Best thread in a decade! Cheers Matt and Linas! (Linas' paper sounds cool but I probably won't have the time to understand it either but if it is good Linas then, as a great scientist recently told me, you just need a ton of patience...)
On Sat, Feb 9, 2019 at 9:03 AM Linas Vepstas <[email protected]> wrote: > Hi Matt, > > On Fri, Feb 8, 2019 at 4:31 PM Matt Mahoney <[email protected]> > wrote: > >> >> >> On Tue, Feb 5, 2019, 5:23 PM Linas Vepstas <[email protected] wrote: >> >>> >>> if there were an experimental results section that told us >>>> which ones were worth pursuing. >>> >>> >>> There's this: >>> >>> >>> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/connector-sets-revised.pdf >>> >>> >>> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/learn-lang-diary.pdf >>> >> >> Yes, that is what I was looking for. I haven't read all of it but so far >> I learned: >> >> 1. It is possible to learn parts of speech and a grammar from unlabeled >> text. >> > > Well, I think that is the major claim, as it is effectively something > that's never been done before (and many people have doubts that it's even > possible, without NN's). It's been the cause of much teeth-gnashing. > > >> >> 2. It is possible to learn word boundaries in continuous speech by >> finding boundaries with low mutual information. (I did similar experiments >> to restore deleted spaces in text using only n-gram statistics. It is how >> infants learn to segment speech at 7-10 months old, before learning any >> words.) >> > > This claim is .. well, I think it's well-explored by academia, with > various "well-known" workable solutions, dating back a decade or two. I've > not had a chance to explore it and try to wedge it into a grand-unified > theory. > >> >> 3. Word pairs have a Zipf distribution just like single words. >> > > Uhh, yes, but no. Depends on what you are graphing. Here's an unpublished > 2009 draft: > https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/word-pairs-2009/word-pairs.pdf > When I recently repeated a variant of this, different techniques on a > different dataset, I got a very clean (logarithmic) Bell curve (Gaussian). > Who knew? When I brought this up on a linguistics mailing (back in 2009) > a general consensus emerged that "if you combine two Zipfs together, you > get a logarithmic Gaussian", and that "this is probably easy to prove", but > neither I nor anyone else took the effort to actually prove it. Its > probably still a sufficiently novel result that a proof, together with > data, is publishable. So much to do, so little time. > > >> (I suspect it is also true of word triples representing grammar rules. It >> suggests there are around 10^8 rules). >> > > Well, that one also gets a "yes, but no." Here: > > * De facto, link-grammar manages to be a pretty decent model of English, > with approx 2.2K rules. (or 11K = 2.2K + 8.8K rules, depending on what is > counted as a "rule") (`cat 4.0.dict | grep ";" |wc` vs. `cat 4.0.dict | > grep -o " or" |wc`) > > * If you brute-force accumulate statistics for disjuncts, you get approx > 10^8 of them. (I have such datasets and can share them, if you have the > RAM) > > * But those datasets contain rules connecting words, and not word-classes. > The grammar needed to parse "I see a dog, I see a rock, I see a tree" does > not require 3 distinct rules for dog,rock,tree; it just needs one: "I see a > <common-noun>". If you are picky, you can have several different kinds of > common nouns, but however picky you are, you need fewer classes than there > are words. A typical class will contain many words. The rules connect > word-classes, not individual words. > > * Automatically discovering word-classes is not terribly difficult, but is > sufficiently confusing that it has tripped up others who have tried. That's > why I spent a lot of time talking about "matrix factorization" in one of > the papers. It turns out that "matrix factorization" is a form of > "clustering", in disguise. > > * There are several styles of matrix factorization, one of which is > more-or-less the same thing as "deep learning" (!) Which is perhaps the > most important thing I have to say on this topic. > > * I cannot quote off the top of my head how many non-zero weights are in > the current "state-of-the-art" natural-language deep-learning networks. I > think under a million(?); depends on what exactly it is that you are trying > to do with the NN model. > > * So, for the syntax of English, together with a shallow amount of > semantics (e.g. at the level of framenet or of wordnet), I think this can > be done with 10K to 1M rules (with LG providing a hard-lower-limit for a > "realistic" language model, and "state-of-the-art" natural-language > deep-learning providing a soft upper-limit.) > > Clearly, syntax plus shallow semantics is not at all AGI. But the point is > that *if* we can create an automated system that can reproducibly, easily > and regularly obtain syntax plus typical ontology-type structure (again, > wordnet/framenet/SUMO/whatever levels-of-sophistication) ***AND*** the > resulting structure is NOT a black box of floating-point values, but is > queriable in a natural way ("a bird is an animal" "an animal is a living > thing", etc. can be posed as questions with affirmative/negative answers, > or even probabilities) ***THEN*** you have a platform on which to deploy > research about reasoning, inference, common-sense knowledge, etc. > > To repeat: > * I believe that purely-automated extraction of syntax plus shallow > semantics, for arbitrary human written language, is well within reach, and > requires about 10K to 1M rules (depending on how deep you > drill/simplify/abstract) > * That the above ruleset can have a natural knowledge-query interface to > it, making it accessible to inferencing and other high-level algos > * That all of the above has already been demonstrated in various distinct > prototypes and proofs-of-concept, (in various typical NAACL-HLT journal > articles) (of which I've personally reproduced some selected handful) > * So .. Lets do it. Integrate & test. Its maybe rocket-science; but it's > not science fiction. > > >> I hope this work continues. It would be interesting if it advances the >> state of the art on my large text benchmark or the Hutter prize. >> > > Thanks! > > I think we're at the state of the art now, although I haven't been able to > convince Ben. Who knew that he was a natural-born skeptic? I'm totally > ignoring benchmarks and competitions; I simply do not have the hours in the > day. (Actually, zero hours of the day, right now; as of about 8-9 months > ago, the language-learning project has been handed over to a team. They are > still getting up to speed; they don't have any background in linguistics or > machine learning, and so have been trying to learn both at the same time, > and getting tangled up and stalled, as a result. Currently they are > stumbling at the "clustering" step; have not yet begun the "ontology" step. > One step forward, two steps back. I'd like to get back on it, as its > clear-as-a-bell to me, but .. time constraints.) > > -- Linas > > >> >> -- Matt Mahoney, [email protected] >> > > > -- > cassette tapes - analog TV - film cameras - you > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-Mc48a258d09b45721f33f4376> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-M0b16dd9e293e0279520f8fb3 Delivery options: https://agi.topicbox.com/groups/agi/subscription
