hey folks--

currently i'm working with a large chunk of old mnemosyne data someone
left on a bit torrent depository.  it looks to be about 6 gigs of user
data, after decompression, between 2006 and 2009.  extremely
interesting stuff...

right now i'm cleaning up and "picking out the good parts".  it's an
incredible volume of data, and it looks like peter is going to be
gracious enough even with his busy schedule to add yet more, current
data.  which is great, but it's almost too much.  so i'm whittling the
size down quite a bit, in a rather cavalier fashion.  first, i'm only
looking at entries that record actual review action.  (ignoring, for
now, whether cards were created, or imported, or whatever, which OS
platform is in use, etc.)  secondly, i'm blowing off users who never
get passed 3 or 4 review sessions before fading away - not enough
history there to mean much, seems to me, at least at this preliminary
stage.  then i'm tossing users with any kind of unusual or strange
data: ambiguous identifiers for card ID, really strange values for
duration time, that sort of thing.  also, i've noticed things like an
occasional goofy system clock.  there are a few instances of dates
that range from 1980 to 2018 or so.  the database is so deep, rather
than try to figure out how to handle these unusual cases, seems
easiest to simply not include them.

if i understand the data structure correctly, here's what i've
currently storing for each card review entry:

autonomous user ID
session #
timestamp
item hashed value (for card ID)
grade assignment
current easiness factor
total # of times card has been viewed for acquistion
total # of times card has been viewed for review
# of "lapses" -- how many times card was forgotten?
# of times card has been viewed for acquistion since last lapse
# of times card has been viewed for review since last lapse
# of days for this scheduled review interval
# of days for actual review interval
# of days til next review interval
study duration, or "thinking time"

soooooo, now the task is to cook up a "to-do" list to figure out what
kind of analysis can best be mined from this mountain of data.  john's
idea of a "reviews_per_day" function sounds excellent.  do you guys
think this would be a good way to "measure the efficiency" of a given
spaced recall technique?

for instance, supermemo's wozniak is famous for making very confident
(but generically unsupported) statements about how one Spacing Effect
algorithm is "better" (more efficient?) than another.

i've been involved with SE for quite a while - one of my frequent
questions from within the education community is "what kind of
expectation is appropriate when implementing a SE review program?"
"how much will a student's productivity increase, really?"  LOTS of
academic studies about specific aspects of SE performance in a lab
setting, but i know of no paper that reports a clearly defined,
measured improvement in a real-world, classroom-like setting over an
extended period.  lots of informed opinions from people experienced
with the concept, but no truly cold, hard "quantifiable" numbers.  if
someone wanted to compare data from mnemosyne(SM-2) with other data,
say from a flashcard program that incorporated no review management at
all, or maybe the leitner system, or an app that used SM-5, or perhaps
some kind of silly "patented, super-secret formula from a nobel-prize
scientist", what would be the best way to succinctly compare/measure
the two?

any suggestions entertained.  plus, additional ideas in general about
how best to utilize the data are appreciated.  what klnd of trouble
could you get in to if you had a few gigabytes of mnemosyne data
laying around and too much time on your hands?

-w


On Jul 14, 1:37 am, Peter Bienstman <[email protected]> wrote:
> I guess you're talking about the text based log files, right?
>
> The relevant code is the following:
>
>     # Create log entry.
>
>     logger.info("R %s %d %1.2f | %d %d %d %d %d | %d %d | %d %d | %1.1f",
>                 item.id, item.grade, item.easiness,
>                 item.acq_reps, item.ret_reps, item.lapses,
>                 item.acq_reps_since_lapse, item.ret_reps_since_lapse,
>                 scheduled_interval, actual_interval,
>                 new_interval, noise, thinking_time)
>
> So, the card id is the first column after the "R" indentifier, and the next to
> last number (noise) can be safely ignored: it's the contribution of randomness
> to the next scheduled interval. (This will no longer be logged in 2.0)
>
> Note that if you install Mnemosyne 2.0, it will read in this data and store
> your revision history in an SQL database, which could be easier for you to
> work with.
>
> Cheers,
>
> Peter
>
> On Wednesday, July 13, 2011 11:37:35 PM my2cents wrote:
>
>
>
>
>
>
>
>
>
> > hi peter--
>
> > when i look at a row in the data that represents review, i can't
> > figure out what the next-to-the-last number represents.  i checked the
> > code and i see variable names for all the numbers except this one.
> > (if i'm reading the code right - i've never used python).  also, i
> > don't see any info that defines which unique card deck is being used.
> > i see 3 variables in the  'database' field, but all seem to be
> > referring to various card status factors, rather than unique id.  can
> > you help me?  thx!
>
> > On Jun 28, 9:21 am, Peter Bienstman <[email protected]> wrote:
> > > On Tuesday, June 28, 2011 04:39:43 PM John Salvatier wrote:
> > > > I am interested in looking at the research facilitated by mnemosyne
> > > > data, who are the researchers who use it?
>
> > > I plan on looking at the data eventually, but at the moment getting 2.0
> > > out of the door is much higher on my priority list :-)
>
> > > Petetr
>
> --
> Peter Bienstman
> Ghent University, Dept. of Information Technology
> Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium
> tel: +32 9 264 34 46, fax: +32 9 264 35 93
> WWW:http://photonics.intec.UGent.be
> email: [email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"mnemosyne-proj-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mnemosyne-proj-users?hl=en.

Reply via email to