Hey, This is regarding the Trading Environment idea.
I am a bit confused here actually. Till now all the examples we implemented are based on gym env, it a requirement? I just wanted to clear this. Apart from this I am trying to decide on the skeleton of the code, and how the agent will use it. I am planning on have something like this for environment: We will have a class named *StockTradingEnv*, which will have 2 more classes: 1. *class Action*: three enum values, ie *BUY* and *SELL *and *HOLD* 2. *class State*: For a state in the trading environment, we generally have *price data*. When I say price data it can only be OHLCV data or OHLCV + technical indicators data 3. a function named *step* which will drive the env. 4. different reward functions 5. Other utility functions like *buy_stock()* and *sell_stock()* Does this sound good for the starters, I am considering this as base a starting from this point. Regards, Gopi On Fri, Mar 12, 2021 at 5:01 PM Gopi Manohar Tatiraju <[email protected]> wrote: > Hey Marcus, > > Yes, you got it correct. We will have a single environment but we can have > multiple agents and reward schemes. I added more info, maybe this will make > things more clear. > > These are the building blocks for solving any DRL problem, I tried to keep > it as simple as possible for now, once we know what exactly we are getting > into then we can talk about the implementation details. > > > - > > Environment: The environment would be a simulated stock exchange that > will contain the functionality of any common exchange and some driver > functions: > - > > Buy Stock > - > > Sell Stock > - > > Step > - > > Reset > - > > Other needed functions accordingly will be implemented > > > > - > > Action Scheme: Agent can buy n shares of any company which can be > denoted as: > > {-k, ..., -1, 0, 1, ..., k} > > For example “Buy 10 shares of KO” and “Sell 10 shares of KO” are 10 and > -10 respectively. So we basically have 3 actions, Buy, Sell and Hold. > > > - > > Reward Scheme: As I already described in my last mail that should > implement different types of reward functions so that we could mimic > different types of strategies. Currently, I am planning to implement: > - > > Simple reward scheme which will be based on the percent change over > a fixed window. > > We can implement more reward schemes, the risk-managed scheme, I > explained earlier can not only be based on net-worth, good risk management > will be key here to get more reward, to implement this we need to > implement functionality like stop loss in our environment. > > > - > > State Space: The state space is what our environment sends to the > agent for observation. State-space will contain OHLCV(Open High Low > Close Volume) and some indicators for technical analysis. > > > > - > > We will be using the agents available to implement an example and if > during GSoC we get any new agent like A2C, we can use that as well. > > > The example will be fully documented and will explain how each and every > component works so that users can understand and get familiar quickly. > > Let me know if I need to clarify anything else or points that you think > are still missing. > > Thanks, > Gopi > > On Thu, Mar 11, 2021 at 9:49 PM Marcus Edel <[email protected]> > wrote: > >> Hello Gopi, >> >> thanks for the clarification, so to me, this sounds like different reward >> functions but in >> the same environment. So I guess the way I would integrate such a task >> into the existing >> codebase is to add a separate task for each scenario. Maybe you have >> another idea? >> >> Regarding the first idea, I will soon implement a basic structure and >> make a PR, I will >> also send a detailed mail of what I am planning regarding the >> pre-processing tool. >> >> >> Sounds good. >> >> Thanks, >> Marcus >> >> >> On 10. Mar 2021, at 01:09, Gopi Manohar Tatiraju <[email protected]> >> wrote: >> >> Heyy Marcus Edel, >> >> Thanks for your feedback. >> >> When we frame trading as an RL problem on the surface it seems like the >> goal of the agent is to *maximize the net worth.* But there are many >> ways to reach this goal and there are *different groups of people who >> work on different principles. * >> >> Let's compare some: >> >> - *Day trader: *The goal of any day trader is to maximize his profit >> but also minimize the risk (Trading 101: Always cap your losses). So for >> this use-case, we want to encourage the agent to use something called >> stop-loss. So more reward should be given to trades that are made with >> stop-loss rather than to the trades which are made without stop-loss. This >> will make sure that our agents learn to cover their losses, which is very >> important in a real-world scenario. >> - *Institutional Traders:* These guys consider VWAP(Volume Weighted >> Average Pricing) as the best price on which they can acquire the stocks. >> So >> regardless of what the current price is these guys always try to buy at >> VWAP only. So for cases like this, we can polarize for not following VWAP, >> thus making it understandable that VWAP is the best price. >> >> >> Different reward_schemes will be tailored for different use-cases. Based >> on how one wants to trade he can choose different reward schemes. >> >> Regarding the first idea, I will soon implement a basic structure and >> make a PR, I will also send a detailed mail of what I am planning regarding >> the pre-processing tool. >> >> Let me know if you have any more doubts regarding reward_schemes or >> anything else. >> >> Thanks, >> Gopi >> >> On Wed, Mar 10, 2021 at 5:37 AM Marcus Edel <[email protected]> >> wrote: >> >>> Hello Gopi M. Tatiraju, >>> >>> thanks for reaching out; I like both ideas, I can see the first idea >>> would >>> integrate perfectly into the preprocessing pipeline; that said, it would >>> be >>> useful to discuss the project's scope in more detail. Specifically, what >>> functionality you like to add, in #2727 you already implemented some >>> features, so I'm curious to hear what other features you have in mind. >>> >>> The RL idea sounds interesting as well, and I think could also fit into >>> the >>> RL codebase that is already there. I'm curious what do you mean with >>> "rewards schemes"? >>> >>> Thanks, >>> Marcus >>> >>> On 9. Mar 2021, at 14:55, Gopi Manohar Tatiraju <[email protected]> >>> wrote: >>> >>> Hello mlpack, >>> >>> I am Gopi Manohar Tatiraju currently in my final year of Engineering >>> from India. >>> >>> I've been working on mlpack for quite some time now. I've tried to >>> contribute and learn from the community. I've received ample support from >>> the community which made learning really fun. >>> >>> Now, as GSoC is back with its 2021 edition, I want to take this >>> opportunity to learn from the mentors and contribute to the community. >>> >>> I am planning to contribute to mlapck under GSoC 2021. Currently, I am >>> working on creating a pandas *dataframe-like class* that can be used to >>> analyze the datasets in a better way. >>> >>> Having a class like this would help in working with datasets as ml is >>> not only about the model but about data as well. >>> >>> I have a pr already open for this: >>> https://github.com/mlpack/mlpack/pull/2727 >>> >>> I wanted to know if I can work on this in GSoC? As it was not listed on >>> the idea page, but I think this would be a start to something useful and >>> big. >>> >>> If this idea doesn't seem workable right now, I want to implement *RL >>> Environments for Trading and some working examples for each env*. >>> >>> >>> What all exactly I am planning to implement are the building blocks of >>> any RL system: >>> >>> - *rewards schemes* >>> - *action schemes* >>> - *env* >>> >>> >>> Fin-Tech is a growing field, and there is a lot of application of Deep-Q >>> Learning there. >>> >>> I am planning to implement different *strategies* like *Bull-Sell-Hold, >>> Long only, Short only.*.. >>> This will make example-repo rich in terms of DRL examples... >>> We can even build a small *backtesting module* that can be used to run >>> backtest on our predictions. >>> >>> There are some libraries that are currently working on such models in >>> python, we can use it as a *reference* to go forward. >>> *FinRL*: https://github.com/AI4Finance-LLC/FinRL-Library >>> >>> *Planning to implement:* >>> >>> Different types of *envs* for different kind of financial tasks: >>> >>> - single stock trading env >>> - multi stock trading env >>> - portfolio selection env >>> >>> Some example env in python: >>> https://github.com/AI4Finance-LLC/FinRL-Library/tree/master/finrl/env >>> >>> Different types of *action_schemes*: >>> >>> >>> - make only long trades >>> - make only short trades >>> - make both long and short >>> - BHS(Buy Hold Sell) >>> >>> Example action_schemes: >>> https://github.com/tensortrade-org/tensortrade/blob/master/tensortrade/env/default/actions.py >>> >>> We can see class BHS, SimpleOrder, etc. >>> >>> Different types of *reward_schemes*: >>> >>> >>> - simple reward >>> - risk-adjusted reward >>> - position based reward >>> >>> >>> For the past 3 months, I've been working as an ML Researcher in a >>> Fin-Tech startup and have worked on this only. >>> >>> I would love to hear your feedback and suggestions. >>> >>> Regards. >>> Gopi M. Tatiraju >>> >>> _______________________________________________ >>> mlpack mailing list >>> [email protected] >>> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack >>> >>> >>> >>
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
