https://www.skynettoday.com/overviews/neural-net-history
Right off the bat this is wrong. This is not where AI started and if it did it still isn't the base of how all AI work! The answer is Markov chains / Prediction by Partial Match (PPM!!). The foundation is not mapping a line on dots to map inputs and outputs to generalize it and find the function. This is going over it all without touching the red cherry in the center!! Looking at the plot is visual to you, but it's really numbers, not even text. ITS NOT LINEAR REGRESSION. This, [here] ,is you predicting the next pixel when you look at all those dots. And I told you, how that works, just yesterday. The same letter or position reoccurs nearly as the same in a text or image(s), hence you can learn what usually comes next! Count them!!! cat>eat, cat>eat, cat>sleep, cats usually eat! Add time delay and and delay pattern and 2 dimensions unlike text and you can do it too. Now, overlooking you *LOOKING *at the plot: If we take his plot of dots as numbers, HERE is the foundation: Notice the higher a dot is, the more to the right it is? 4,5.....88,121........33,30......556,856......they are translates. Cat cat dog dog table table you you eat eat vine vine gum gum horse horse teeth ? What comes next? And if you simply use the Frequency I explained, it can solve 4, _?_....usually it's 4, sometimes 3 or 5, less common is it 2 or 6...and so on. This is pattern matching, NOT mapping linear planes of lines..............;( :( xD( cry Just cuz they teach it in machine academy school you gotta be this dumb too. :( Give it tons of thought. I GET that backprop may also be a way to do my way, simply faster seemingly, it's just an optimization, like using a jeep over a truck, still a vehicle, different gas type...............BUT this is an optimization! Not actually how pattern prediction works/ what it is (pattern matching!!) It's like saying planes are red cuz rockets are used most and are fastest and look mostly red from fire, when really planes are just bodies with motion, backprop is a overcast shadow covering it. ---------------------------------------------------------- 'This generalization principle is so important that there is almost always a *test set* of data (more examples of inputs and outputs) that is not part of the training set. The separate set can be used to evaluate the effectiveness of the machine learning technique by seeing how many of the examples the method correctly computes outputs for given the inputs. The nemesis of generalization is *overfitting* - learning a function that works really well for the training set but badly on the test set. Since machine learning researchers needed means to compare the effectiveness of their methods, over time there appeared standard *datasets* of training and testing sets that could be used to evaluate machine learning algorithms." ....Don't you onlyyyy test predictions on test set??? Not training set. Training set only builds the model, there is, oh, if you use backprop, ya, but really this is not predictions being used, the real thing is you are building a model, you can tell this is true by the fact that you don't need prediction error to train - you need more data, and add counts onto connections like markov chains or PPM, the error for training is only using this idea. You wouldn't call it overfitting then, simply bad score. In my AI I tweak parameters in the main algorithm, this is not really like neural weights though, and this can be automated BTW. As for tweaking neural weights to lower prediction error, the code can't tweak this on your own so that you change one connection weight at some cheap cost lost while gain more accuracy on many other tasks.... this can be done only by the data, and can be done cleanly. For a example, if you see the context "walking down the ?" what what are you going to lower one weight that predicts frog and up other weight that predicts street? You don't need to though! You never see frog in this context, you say only what you see, then combine predictions of course to say unseen true answers. Another way to change predictions to get more bang for buck, so this thinking of his way assumes (!): Look at 'these' windows on the prompt of text like 'this' and not like 'this', ex. the last 2 words and a hole then next word, and other windows like so, and combine predictions from each context matches in memory. I.e. decide which to use, lose some gain to get more bang diversely, hmm, so you'd do holed matches along the prompt instead of mostly holed matches on the just last 8 letters, which I already know about, but anyway hmm, changing weights using backprop to achieve this, is absurd, it is a code thing. .................My point is automating this as a brute force to decide where to do holed text matches, or what to predict, is WRONG, and costly, we don't need to causelessly predict frog, we let it store entail contexts it saw, and we don't need to let it brute force try to change code to decide where to do holed matches on prompts, we tell it where to and in the advanced stages it tells itself where to look. Really ehere to look on the prompt is a entail thing, cuz why do i say look here here and here? Cuz I see context>and say what comes next. And that's the advanced stage. ------------------------------------------------------ "The reason why this does not work for multiple layers should be intuitively clear: the example only specifies the correct output for the final output layer, so how in the world should we know how to adjust the weights of Perceptrons in layers before that? The answer, despite taking some time to derive, proved to be once again based on age-old calculus: the chain rule." Haha, no, PPM and Markov Chains are how, and much easier, and can work in a full tall hierarchy, not just trie tree. Backprop is ONLY A OPTIMIZATION (that may be wrong and overly confusing AI development both in laypeople and in telling a net how to gather predictions). --------------------------------------------------------- He mentions doing backprop to find exponential thresholds, and that a NN must be able to do AND/OR/NOT. No, exponential is based on some criteria, codable for all at start of code, no need to change it, it adapts. For example cat>eats is predicted 80%, so predict it 85%, but this changes if has more predictions or the word is rare or too common, see? Something like that propably. And the AND/OR/NOT, how many times do I do that in a day!!?? PPM doesn't do that really, really! Only IF>THEN prediction based on contexts looked at from the prompt text. It will predict cat eat or ate if both are just as probable, half the time each will get predicted. And holes matches considers half only as is needed, doing something like a OR careless of the other half ex. "walkddging downz the wide strZet and saw a ?" Ignores those flaws and predicts still. And to actually do AND or OR is done by things based on the things PPM do....matches...for example translate ex. if a and b are predicted right as i say prior back here, predict "go". ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T44c4079317aac9d1-Md537de721cc3c2c3da635455 Delivery options: https://agi.topicbox.com/groups/agi/subscription
