[mnemosyne-proj-users] Bayesian scheduling of reviews

Gwern Branwen Mon, 31 Mar 2014 14:06:40 -0700

"Improving students’ long-term knowledge retention through
personalized review", Lindsey et al 2013
http://laplab.ucsd.edu/articles/LindseyShroyerPashlerMozer2013.pdf


> Human memory is imperfect; thus, periodic review is required for the 
> long-term preservation of knowledge and skills. However, students at every 
> educational level are challenged by an ever-growing amount of material to 
> review and an ongoing imperative to master new material. We developed a 
> method for efficient, systematic, personalized review that combines 
> statistical techniques for inferring individual differences with a 
> psychological theory of memory. The method was integrated into a 
> semester-long middle school language course via retrieval-practice software. 
> In a cumulative exam administered after the semester’s end that compared 
> time-matched review strategies, personalized review yielded a 16.5% boost in 
> course retention over current educational practice (massed study) and a 10.0% 
> improvement over a one-size-fits-all strategy for spaced study.

> ...We incorporated systematic, temporally distributed review into 
> third-semester Spanish foreign language instruction using a web-based 
> flashcard tutoring system, the Colorado Optimized Language Tutor or colt. 
> Throughout the semester, 179 students used colt to drill on ten chapters of 
> material. colt presented vocabulary words and short sentences in English and 
> required students to type the Spanish translation, after which corrective 
> feedback was provided.
>
> ...A generic-spaced scheduler selected one previous chapter to review at a 
> spacing deemed to be optimal for a range of students and a variety of 
> material according to both empirical studies (Cepeda et al., 2006; Cepeda, 
> Vul, Rohrer, Wixted, & Pashler, 2008) and computational models (Khajah, 
> Lindsey, & Mozer, 2013; Mozer, Pashler, Cepeda, Lindsey, & Vul, 2009). On the 
> time frame of a semester—where material must be retained for 1-3 months—a 
> one-week lag between initial study and review obtains near-peak performance 
> for a range of declarative materials. To achieve this lag, the generic-spaced 
> scheduler selected review items from the previous chapter, giving priority to 
> the least recently studied (Figure 1).
> A personalized-spaced scheduler used a latent-state Bayesian model to predict 
> what specific material a particular student would most benefit from 
> reviewing. This model infers the instantaneous memory strength of each item 
> the student has studied. The inference problem is difficult because past 
> observations of a particular student studying a particular item provide only 
> a weak source of evidence concerning memory strength. To illustrate, suppose 
> that the student had practiced an item twice, having failed to translate it 
> once 15 days ago but having succeeded 9 days ago. Based on these sparse 
> observations, it would seem that one cannot reliably predict the student’s 
> current ability to translate the item. However, data from the population of 
> students studying the population of items over time can provide constraints 
> helpful in characterizing the performance of a specific student for a 
> specific item at a given moment. Our model-based approach is related to that 
> used by e-commerce sites that leverage their entire database of past 
> purchases to make individualized recommendations, even when customers have 
> sparse purchase histories. Our model defines memory strength as being jointly 
> dependent on factors relating to (1) an item’s latent difficulty, (2) a 
> student’s latent ability, and (3) the amount, timing, and outcome of past 
> study. We refer to the model with the acronym dash summarizing the three 
> factors (difficulty, ability, and study history). By incorporating 
> psychological theories of memory into a data-driven modeling approach, dash 
> characterizes both individual differences and the temporal dynamics of 
> learning and forgetting. The Appendix describes dash in detail.
> The scheduler was varied within participant by randomly assigning one third 
> of a chapter’s items to each scheduler, counterbalanced across participants. 
> During review, the schedulers alternated in selecting items for retrieval 
> practice. Each selected from among the items assigned to it, ensuring that 
> all items had equal opportunity and that all schedulers administered an equal 
> number of review trials. Figure 1 and Table 1 present student-item statistics 
> for each scheduler over the time course of the experiment.
>
> ...To evaluate the quality of dash’s predictions, we compared dash against 
> alternative models by dividing the 597,990 retrieval practice trials recorded 
> over the semester into 100 temporally contiguous disjoint sets, and the data 
> for each set was predicted given the preceding sets. The accumulative 
> prediction error (Wagenmakers, Gr̈unwald, & Steyvers, 2006) was computed 
> using the mean deviation between the model’s predicted recall probability and 
> the actual binary outcome, normalized such that each student is weighted 
> equally. Figure 4 compares dash against five alternatives: a baseline model 
> that predicts a student’s future performance to be the proportion of correct 
> responses the student has made in the past, a Bayesian form of item-response 
> theory (irt) (De Boeck & Wilson, 2004), a model of spacing effects based on 
> the memory component of act-r (Pavlik & Anderson, 2005), and two variants of 
> dash that incorporate alternative representations of study history motivated 
> by models of spacing effects (act-r, mcm). Details of the alternatives and 
> the evaluation are described in the Supplemental Online Material. The three 
> variants of dash perform better than the alternatives. Each variant has two 
> key components: (1) a dynamical representation of study history that can 
> characterize learning and forgetting, and (2) a Bayesian approach to 
> inferring latent difficulty and ability factors. Models that omit the first 
> component (baseline and irt) or the second (baseline and act-r) do not fare 
> as well. The dash variants all perform similarly.

DASH is defined on pg11. Unfortunately, they don't compare directly to
any of the Supermemo algorithms, so I'm not sure how useful it would
be.

-- 
gwern
http://www.gwern.net

-- 
You received this message because you are subscribed to the Google Groups 
"mnemosyne-proj-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/mnemosyne-proj-users/CAMwO0gys0zeX%2Bc8mHV3yTQ6tRY0PQSoCcbbLds6wgf7ySWocCA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[mnemosyne-proj-users] Bayesian scheduling of reviews

Reply via email to