0851

This is the solution I filled in:

# TODO: Define a PPO MlpPolicy architecture
# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,
# if we had frames as input we would use CnnPolicy
import stable_baselines3
model = stable_baselines3.PPO('MlpPolicy', env, verbose=1)

This is the solution they provide:

# SOLUTION
# We added some parameters to fasten the training
model = PPO(
    policy = 'MlpPolicy',
    env = env,
    n_steps = 1024,
    batch_size = 64,
    n_epochs = 4,
    gamma = 0.999,
    gae_lambda = 0.98,
    ent_coef = 0.01,
    verbose=1)

I will copy their parameters over to my code, thinking briefly about
each one. I recognise 3 of them. I recall that some of them were
mentioned in the learning material, and I do not remember what they
are.
  • [ot][spam] Behavior Log For... Karl Semich
    • Re: [ot][spam] Behavio... Karl Semich
      • Re: [ot][spam] Beh... Karl Semich
        • Re: [ot][spam]... Undiscussed Horrific Abuse, One Victim of Many
          • Re: [ot][s... Undiscussed Horrific Abuse, One Victim of Many
            • Re: [... Undiscussed Horrific Abuse, One Victim of Many
              • R... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many

Reply via email to