On Mon, May 9, 2022, 8:15 AM Undiscussed Horrific Abuse, One Victim of Many <gmk...@gmail.com> wrote:
> > > On Mon, May 9, 2022, 8:14 AM Undiscussed Horrific Abuse, One Victim of > Many <gmk...@gmail.com> wrote: > >> >> >> On Mon, May 9, 2022, 8:12 AM Undiscussed Horrific Abuse, One Victim of >> Many <gmk...@gmail.com> wrote: >> >>> >>> >>> On Mon, May 9, 2022, 8:05 AM Undiscussed Horrific Abuse, One Victim of >>> Many <gmk...@gmail.com> wrote: >>> >>>> To represent normal goal behavior with maximization, the >>>>>>>>>> >>>>>>>>> >>> This is all confused to me, but normally when we meet goals we don't >>> influence things not related to the goal. This is not usually included in >>> maximization, unless >>> >>> return function needs to not only be incredibly complex, but >>>>>>>>>> >>>>>>>>> >>> the return to be maximized were to include them, by maybe always being >>> 1.0, I don't really know. >>> >>> also feed back to its own evaluation, in a way not >>>>>>>>>> >>>>>>>>> >>> Maybe this relates to not learning habits unrelated to the goal, that >>> would influence other goals badly. >>> >>> provided for in these libraries. >>>>>>>>>> >>>>>>>>> >>> But something different is thinking at this time. It is the role of a >>> part of a mind to try to relate with the other parts. Improving this in a >>> general way is likely known well to be important. >>> >>> >>>> Daydreaming: I'm thinking of how in reality and normality, we have many >>>> many goals going at once (most of them "common sense" and/or "staying being >>>> a living human"). Similarly, I'm thinking of how with normal transformer >>>> models, one trains according to a loss rather than a reward. >>>> >>>> I'm considering what if it were more interesting when an agent _fails_ >>>> to meet a goal. Its reward would usually be full, 1.0, but would multiply >>>> by losses when goals are not met. >>>> >>>> This seems much nicer to me. >>>> >>> >> I don't know how RL works since I haven't taken the course, but it looks >> to me from a distance like it would just learn at a different (slower) rate >> [with other differences] >> > yes >> > > I think it relates to the other inhibited concept, of value vs action > learning. a reward starts at just the event of interest, for example, but > the system then learns to apply rewards to things that can relate to the > event, like preceding time points [states]. > > in the end, what is important is what you are asking to change in the real world. if the final goal state has an infinite quantity, then maximisation has been misused [still thinking though, this leaked out] >