hey folks-- currently i'm working with a large chunk of old mnemosyne data someone left on a bit torrent depository. it looks to be about 6 gigs of user data, after decompression, between 2006 and 2009. extremely interesting stuff...
right now i'm cleaning up and "picking out the good parts". it's an incredible volume of data, and it looks like peter is going to be gracious enough even with his busy schedule to add yet more, current data. which is great, but it's almost too much. so i'm whittling the size down quite a bit, in a rather cavalier fashion. first, i'm only looking at entries that record actual review action. (ignoring, for now, whether cards were created, or imported, or whatever, which OS platform is in use, etc.) secondly, i'm blowing off users who never get passed 3 or 4 review sessions before fading away - not enough history there to mean much, seems to me, at least at this preliminary stage. then i'm tossing users with any kind of unusual or strange data: ambiguous identifiers for card ID, really strange values for duration time, that sort of thing. also, i've noticed things like an occasional goofy system clock. there are a few instances of dates that range from 1980 to 2018 or so. the database is so deep, rather than try to figure out how to handle these unusual cases, seems easiest to simply not include them. if i understand the data structure correctly, here's what i've currently storing for each card review entry: autonomous user ID session # timestamp item hashed value (for card ID) grade assignment current easiness factor total # of times card has been viewed for acquistion total # of times card has been viewed for review # of "lapses" -- how many times card was forgotten? # of times card has been viewed for acquistion since last lapse # of times card has been viewed for review since last lapse # of days for this scheduled review interval # of days for actual review interval # of days til next review interval study duration, or "thinking time" soooooo, now the task is to cook up a "to-do" list to figure out what kind of analysis can best be mined from this mountain of data. john's idea of a "reviews_per_day" function sounds excellent. do you guys think this would be a good way to "measure the efficiency" of a given spaced recall technique? for instance, supermemo's wozniak is famous for making very confident (but generically unsupported) statements about how one Spacing Effect algorithm is "better" (more efficient?) than another. i've been involved with SE for quite a while - one of my frequent questions from within the education community is "what kind of expectation is appropriate when implementing a SE review program?" "how much will a student's productivity increase, really?" LOTS of academic studies about specific aspects of SE performance in a lab setting, but i know of no paper that reports a clearly defined, measured improvement in a real-world, classroom-like setting over an extended period. lots of informed opinions from people experienced with the concept, but no truly cold, hard "quantifiable" numbers. if someone wanted to compare data from mnemosyne(SM-2) with other data, say from a flashcard program that incorporated no review management at all, or maybe the leitner system, or an app that used SM-5, or perhaps some kind of silly "patented, super-secret formula from a nobel-prize scientist", what would be the best way to succinctly compare/measure the two? any suggestions entertained. plus, additional ideas in general about how best to utilize the data are appreciated. what klnd of trouble could you get in to if you had a few gigabytes of mnemosyne data laying around and too much time on your hands? -w On Jul 14, 1:37 am, Peter Bienstman <[email protected]> wrote: > I guess you're talking about the text based log files, right? > > The relevant code is the following: > > # Create log entry. > > logger.info("R %s %d %1.2f | %d %d %d %d %d | %d %d | %d %d | %1.1f", > item.id, item.grade, item.easiness, > item.acq_reps, item.ret_reps, item.lapses, > item.acq_reps_since_lapse, item.ret_reps_since_lapse, > scheduled_interval, actual_interval, > new_interval, noise, thinking_time) > > So, the card id is the first column after the "R" indentifier, and the next to > last number (noise) can be safely ignored: it's the contribution of randomness > to the next scheduled interval. (This will no longer be logged in 2.0) > > Note that if you install Mnemosyne 2.0, it will read in this data and store > your revision history in an SQL database, which could be easier for you to > work with. > > Cheers, > > Peter > > On Wednesday, July 13, 2011 11:37:35 PM my2cents wrote: > > > > > > > > > > > hi peter-- > > > when i look at a row in the data that represents review, i can't > > figure out what the next-to-the-last number represents. i checked the > > code and i see variable names for all the numbers except this one. > > (if i'm reading the code right - i've never used python). also, i > > don't see any info that defines which unique card deck is being used. > > i see 3 variables in the 'database' field, but all seem to be > > referring to various card status factors, rather than unique id. can > > you help me? thx! > > > On Jun 28, 9:21 am, Peter Bienstman <[email protected]> wrote: > > > On Tuesday, June 28, 2011 04:39:43 PM John Salvatier wrote: > > > > I am interested in looking at the research facilitated by mnemosyne > > > > data, who are the researchers who use it? > > > > I plan on looking at the data eventually, but at the moment getting 2.0 > > > out of the door is much higher on my priority list :-) > > > > Petetr > > -- > Peter Bienstman > Ghent University, Dept. of Information Technology > Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium > tel: +32 9 264 34 46, fax: +32 9 264 35 93 > WWW:http://photonics.intec.UGent.be > email: [email protected] -- You received this message because you are subscribed to the Google Groups "mnemosyne-proj-users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/mnemosyne-proj-users?hl=en.
