On Tue, Feb 19, 2019 at 7:39 AM Ben Goertzel <b...@goertzel.org> wrote:

>
> I agree in the  medium term we don't need to use deep NNs to tweak the
> weights of various patterns,


Sigh. Anton has fallen into a trap, the nature of which he does not
understand. If you take only one or two or three sample parses, and convert
them into disjuncts, all that you are doing is "memorizing", in a variant
form, the sample parses.  If your sample is wrong, then what you memorized
is also wrong.

There are two ways out.  One is to improve the quality of your samples in
some way. The other is to take more than two or three samples.

Sure, maybe some magic mumbo-jumbo neural-net stuff can improve the quality
of your samples. But, in that case, **why the heck are you bothering to
sample?**  Don't do that. Just jump to the right answer; you don't need the
heavy-weight machinery of sampling.

But if you want to use the machinery of sampling, and statistics and
probability, for God's sake, don't stop your experiments after three
samples.

Anton's POC-corpus is like flipping a coin exactly once, and obtaining 100%
accuracy on the prediction of a single coin-flip. Its meaningless and
worse: it's non-sensical.

The child-corpus F1-results of 60% is like flipping a coin three times, and
then complaining that the accuracy of his predictions is crappy, and its
probably impossible to do better.

Just flip the freakin coin more than three times. Using neural-nets to
improve the accuracy of three coin-flips might be fun and entertaining, but
it kind of completely misses the whole point of probability and statistics.

-- linas



> but in the short term I believe it will
> help considerably...
>
> The unfortunate fact is we can't currently feed as much data into our
> OpenCog self-adapting graph as we can into a BERT type model, given
> available resources... thus using the latter to help tweak weights in
> the former may have significant tactical advantage...
>
> ben
>
> On Tue, Feb 19, 2019 at 5:23 PM Rob Freeman <chaotic.langu...@gmail.com>
> wrote:
> >
> > Linas,
> >
> > Ooh, Nice.
> >
> > This is different to what I saw in the links Ben posted. If you are
> really deconstructing your grammar like this then Ben could be right, it
> might be a good fit with me. Everything can reduce to graphs. If you are
> visiting there from Link Grammar rather than embedding vectors which was my
> path, that does not matter. So long as you travel fully along to the
> destination of a raw network we can get the same power.
> >
> > An aside. You mention sheaf theory as a way to get around the linearity
> of vector spaces. Is this influenced in any way by what Coecke, Sadrzadeh,
> and Clark proposed for compositional distributional models in the '00s?
> >
> > E.g.
> > Category-Theoretic Quantitative Compositional Distributional Models of
> Natural Language Semantics
> > Edward Grefenstette
> > https://arxiv.org/abs/1311.1539
> >
> > I see you cite Coecke in your 2017 "Sheaves: A Topological Approach to
> Big Data" paper.
> >
> > Personally I followed their work when I came across it in the '00s. It
> was the first other work in a compositional distributional vein I had come
> across so I was delighted to find it. There was precious little about
> distributed models in the '00s, let alone compositional distributional. But
> I decided that the formalisms of both category theory as a response to the
> subjectivity of maths, and QM as a model for the subjectivity of physics,
> may well apply, but that in practice it will be easier to build structures
> which manifest these properties, rather than to formally describe them.
> >
> > Anyway, perhaps Ben is right, you may be doing the first two steps of my
> suggested solution: 1) coding only a sequence net of observed sequences,
> and 2) projecting out latent "invariants" by clustering according to shared
> contexts.
> >
> > But then if you are doing all this, why are you using BERT type training
> "to guide the numerical weightings of symbolic language-patterns"? That
> will still trap you in the limitations of learned representations. The
> whole point of a network is that, like a distributed representation, it can
> handle multiplicity of interpretation. Once you fix it by "learning" you
> have lost this. Perhaps the high current state of development of these
> learning algorithms may help in the short term, but it seems like a misstep.
> >
> > The solution I came is to forget all thought of training or "learning"
> representations. Not least because you get contradictions.
> >
> > And I believe the best way to do that will be to set the network
> oscillating and varying inhibition, to get the resolution of groupings we
> want dynamically.
> >
> > -Rob
> >
> > On Tue, Feb 19, 2019 at 6:45 PM Linas Vepstas <linasveps...@gmail.com>
> wrote:
> >>
> >> Hi Rob,
> >>
> >> On Mon, Feb 18, 2019 at 4:40 PM Rob Freeman <chaotic.langu...@gmail.com>
> wrote:
> >>>
> >>> Ben,
> >>>
> >>> That's what I thought. You're still working with Link Grammar.
> >>>
> >>> But since last year working on informing your links with stats from
> deep-NN type, learned, embedding vector based predictive models? You're
> trying to span the weakness of each formalism with the strengths of the
> other??
> >>
> >>
> >> Yes but no. I've been trying to explain what exactly is good, and what,
> exactly is bad with NN vector-space models. There is a long tract written
> on this here.
> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/skippy.pdf
> >>>
> >>>
> >>> There's a lot to say about all of that.
> >>>
> >>> Your grammar will be learned, with only the resolution you bake in
> from the beginning.
> >>
> >> No.
> >>
> >>>
> >>> Your embedding vectors will be learned,
> >>
> >>
> >> The point of the long PDF is to explain why NN-vectors are bad. It
> attempts to first explain *why* neural nets work for language, and why
> vectors are *almost* the right thing, and then it tries to explain why NN
> vectors don't actually do everything you actually want.  I've noticed that,
> in the middle of all these explanations, I lose my audience; haven't
> figured out how to keep them, yet.
> >>>
> >>> and the dependency decisions they can inform on learned, and thus
> finite, too. Plus you need to keep two formalisms and marry them
> together... Large teams for all of that...
> >>
> >>
> >> No. I've already got 75% of it coded up. It actually works, I've got
> long diary entries and notes with detailed stats on it all.  Unfortunately,
> I have not been able to carve out the time to finish the work, its been
> stalled since the fall of last year.
> >>
> >> It would be wonderful if I could get someone else interested in this
> work.
> >>
> >> --linas
> >
> > Artificial General Intelligence List / AGI / see discussions +
> participants + delivery options Permalink
> 
> --
> Ben Goertzel, PhD
> http://goertzel.org
> 
> "Listen: This world is the lunatic's sphere,  /  Don't always agree
> it's real.  /  Even with my feet upon it / And the postman knowing my
> door / My address is somewhere else." -- Hafiz


-- 
cassette tapes - analog TV - film cameras - you

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-M1f6d34745213971d3f78e105
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to