[mnemosyne-proj-users] Re: Results of experiment conducted with Mnemosyne

Gwern Branwen Sun, 26 Apr 2009 07:21:03 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Sat, Apr 25, 2009 at 12:06 PM, Bill Price  wrote:
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.5)


iEYEAREKAAYFAkn0bcMACgkQvpDo5Pfl1oICsgCggWH0ejkLH07l66DSBsH5K39s
HxsAn0Twpg0xxsoBe9ODzl4OfMcNWsAi
=pas+
-----END PGP SIGNATURE-----

...
> My advisor killed my idea to have a third comparison group of non
> software users. I still disagree with that decision, but her argument
> was essentially that any difference would be trivially explained by
> the fact that my intervention amounted to a "studying aid." She wanted
> my only comparisons to be between two groups with just one variable
> modified between them, namely the utilization of spaced repetition
> scheduling. I still think it would be useful knowledge to at least SEE
> where the rest of the class was on this material at those two
> evaluation points, though; so as I said, I disagree with her reasoning
> and decision.

That's too bad; wouldn't it have been really easy to have another
group you just did pre- and post-tests on? Oh well.

> Also, to your point about the score increases: it is true that the
> average scores for the two groups increased in roughly-equivalent
> magnitudes—about 12 points for the spaced group and about 10 points
> for the intuitive group. However, I specifically said that the spaced
> group showed a SIGNIFICANT (as in, p < .05) increase. The difference
> for the intuitive group was NOT statistically significant, largely due
> to the massive variance of the group. One reason the intuitive group
> has such a high average improvement, actually, is that one individual
> showed an improvement of 42 points between pre-assessment and post-
> assessment, which is a wholly anomalous score. My assumption is that
> that individual simply hadn't learned the material before the pre-
> assessment was administered, and so failed it terribly.

Yes, that's true. My bad. I'd ask how they stack up if you remove that
one guy, but I remember vaguely you covered that. :)

> On thing that must be cleared up is that 32.6% figure you tossed out.
>
> Out of the 9 sub-decks the participants were given, only 2 of them
> (decks 2 and 3) were tested on the pre-assessment and the post-
> assessment. This was to buy time for any benefits due to the spaced
> repetition schedule to manifest. My assumption was that subjects would
> not place much emphasis on studying such "old" material over the
> course of the experiment, and so the intuitive group would be expected
> to study decks 2 and 3 heavily at the start of the experiment and then
> to drop off for the rest of the study.
>
> The 32.6% figure meant that, during the time period that I tracked,
> the spaced repetition group studied 32.6% more cards from decks 2 and
> 3 than the intuitive repetition group did.
>
> Recall that I was unable to collect adequate learning data for the
> first five days of the study, though. Thus, I do not know how many
> times members of either group studied decks 2 and 3 during that time.
> That was the time in which I predicted the intuitive group would study
> those decks the heaviest, so it MAY be the case that the total trials
> of the assessment-relevant material between the groups was near-equal.
> If I could show that, then my results would be much stronger, but I
> simply lack the data.

OK, that's interesting. I don't consider it too bad from our (spaced
partisans') view though - even if the intuitives had studied so hard
as to make up the repetition gap, they still wound up with inferior
results.

> You are correct in saying that I should have given a more full account
> of subject compliance. I will draw up a table and put it in my
> Appendix D.

(Cool. See me other email.)

> One other point: One reason I decided not to do any analyses of the
> easiness factors assigned within or between groups is that subjects
> showed rather different opinions of what each grade meant. This was
> especially prevalent in the intuitive group; one of the intuitive
> subjects began by only assigning grades of 0 or 4, for instance. I
> tried to do running checks of everyone's learning data to ensure that
> the grading distributions seemed reasonable and to alert people to any
> issues I had with their grading, but on such a subjective task, it's
> difficult to know what's "correct." Also, to be fair, the two groups
> had different ideas of what the grades were for. The spaced repetition
> group knew the purpose behind them, but the intuitive repetition
> subjects had no clear idea what the grades were for (beyond some
> general idea I gave that the grades would "help us determine which
> characters in the decks were especially difficult to remember," etc).

I'm not sure it's *that* hopeless. Even if people are doing odd
schemes like assigning only 0s and 4s, you could still capture this by
taking 0-2 as hard, 3 as medium, and 4-5 as easy. But I suppose if
there were any overall bias in either group, you would have noticed it
during your running checks... Another thing for future research, I
suppose.

> —Bill

-- 
gwern

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mnemosyne-proj-users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/mnemosyne-proj-users?hl=en
-~----------~----~----~----~------~----~------~--~---

[mnemosyne-proj-users] Re: Results of experiment conducted with Mnemosyne

Reply via email to