https://huggingface.co/cerebras/btlm-3b-8k-base/discussions/25
Context length schedule and performance #25 by baffo32 - opened less than a minute ago Discussion > Hey, > > I’m looking at your chart showing incredible performance improvement greatly > extending the context length with a smaller portion of training at the end. > > It’s quite notable most of the gains are in the untrained context lengths. > > It looks to me like steadily increasing the context length throughout > training could possibly flatline the chart, these relative gains are so big. > > Has anyone tried training on steadily increasing context lengths?
