Ok, that page was so short!!! Back to: Lunar Lander Documentation: https://www.gymlibrary.ml/environments/box2d/lunar_lander
> Action Space: Discrete(4) Action is 1 of 4 integers > Observation Shape: (8,) Observation space is an unbounded 8-vector > Observation High: [inf inf inf inf inf inf inf inf] > Observation Low: [-inf -inf -inf -inf -inf -inf -inf -inf] > Import: gym.make("LunarLander-v2") > Description > This environment is a classic rocket trajectory optimization problem. > According to > Pontryagin’s maximum principle, it is optimal to fire the engine at full > throttle or turn it off. > This is the reason why this environment has discrete actions: engine on or > off. Aww shouldn't the model learn this? > There are two environment versions: discrete or continuous. The landing pad > is always at > coordinates (0,0). The coordinates are the first two numbers in the state > vector. Landing > outside of the landing pad is possible. Fuel is infinite, so an agent can > learn to fly and then > land on its first attempt. > > To see a heuristic landing, run: > > python gym/envs/box2d/lunar_lander.py Otherwise known as: pip3 install gym[box2d] && python3 -m gym.envs.box2d.lunar_lander # i think > Action Space > There are four discrete actions available: do nothing, fire left orientation > engine, fire main > engine, fire right orientation engine. > > Observation Space > There are 8 states: the coordinates of the lander in x & y, its linear > velocities in x & y, its > angle, its angular velocity, and two booleans that represent whether each leg > is in contact > with the ground or not. > Rewards > Reward for moving from the top of the screen to the landing pad and coming to > rest is > about 100-140 points. If the lander moves away from the landing pad, it loses > reward. If the > lander crashes, it receives an additional -100 points. If it comes to rest, > it receives an > additional +100 points. Each leg with ground contact is +10 points. Firing > the main engine > is -0.3 points each frame. Firing the side engine is -0.03 points each frame. > Solved is 200 > points. This is very very similar to the text from huggingface's lab. > Starting State > The lander starts at the top center of the viewport with a random initial > force applied to its > center of mass. > Episode Termination > The episode finishes if: > 1. the lander crashes (the lander body gets in contact with the moon); > 2. the lander gets outside of the viewport (x coordinate is greater than 1); > 3. the lander is not awake. From the Box2D docs, a body which is not awake is > a > body which doesn’t move and doesn’t collide with any other body: > > When Box2D determines that a body (or group of bodies) has come to rest, > > the body > > enters a sleep state which has very little CPU overhead. If a body is awake > > and collides > > with a sleeping body, then the sleeping body wakes up. Bodies will also > > wake up if a joint > > or contact attached to them is destroyed. > Arguments > To use to the continuous environment, you need to specify the continuous=True > argument > like below: > > > import gym > > env = gym.make("LunarLander-v2", continuous=True) They don't say what the continuous environment is. It seems like source code is still a better resource than documentation. When installed with pip in linux, the environment source is at ~/.local/lib/python3.*/site-packages/gym/envs/box2d/lunar_lander.py for me. On the web, that's https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py . It looks like the documentation on the web is not up to date or is truncated for some reason. The documentation in the source code does indeed continue: > If `continuous=True` is passed, continuous actions (corresponding to the > throttle of the engines) will be used and the > action space will be `Box(-1, +1, (2,), dtype=np.float32)`. > The first coordinate of an action determines the throttle of the main > engine, while the second > coordinate specifies the throttle of the lateral boosters. > Given an action `np.array([main, lateral])`, the main engine will be > turned off completely if > `main < 0` and the throttle scales affinely from 50% to 100% for `0 <= > main <= 1` (in particular, the > main engine doesn't work with less than 50% power). > Similarly, if `-0.5 < lateral < 0.5`, the lateral boosters will not fire > at all. If `lateral < -0.5`, the left > booster will fire, and if `lateral > 0.5`, the right booster will fire. > Again, the throttle scales affinely > from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively). > `gravity` dictates the gravitational constant, this is bounded to be > within 0 and -12. > If `enable_wind=True` is passed, there will be wind effects applied to the > lander. > The wind is generated using the function `tanh(sin(2 k (t+C)) + sin(pi k > (t+C)))`. > `k` is set to 0.01. > `C` is sampled randomly between -9999 and 9999. > `wind_power` dictates the maximum magnitude of wind. So, you can indeed provide a harder challenge to the agent, by using continuous=True and/or enable_wind=True . Like usual, they thought of my concern. This appears to roughly be the full documentation of the LunarLander environment (v2).