I do this sort of thing all the time. I don't use awk (but rather, perl, ruby, or groovy; depending on my mood), but I agree with Peter that a:
question<TAB>answer format is pretty easy. On Thu, Oct 13, 2011 at 5:33 AM, Peter Bienstman <[email protected]>wrote: > It's probably easier to convert it to tab delimited txt format encoded > in utf8. That way you don't have to deal with XML. If I had to do the > job myself, I'd use Python as opposed to awk, but that's just my > personal preference :-) > > Peter > > On Oct 12, 8:53 pm, daveoily <[email protected]> wrote: > > Hi all, I'm studying Japanese, if you are too, you might have heard of > > smartfm, they were brilliant, but then decided to make people pay for > > it, it's fair enough I suppose, but I haven't the money. So I spent a > > few hours in the days before it went to being a paysite downloading > > all the stuff I could, example sentences and the pages with > > information about the translations and pronunciation. > > > > It occurs to me that I could make mnemosyne cards from this bunch of > > information, but doing it manually would take me an age, eating into > > precious study time. I'm currently working on a whole stack of other > > cards to be uploaded upon completion anyway. > > > > The way I see it, is if I can strip the pertinent information from the > > html and put it into the right format for a mnemosyne xml file, I > > could automate the process to such an extent that it would take > > seconds to create the cards, IF and it is a big if, I knew how to do > > it! > > > > I've had a brush with AWK before, and I think it might be the right > > tool for such a job, but I'm no expert to put it mildly, and would > > really appreciate some help with this one if some knowledgeable soul > > could see how to do it. > > > > There's hundreds, perhaps thousands of example sentences in mp3 > > format, they're the files named JS******.mp3, and also words (I'm not > > sure if I got them all, but they may well be there named JW******.mp3 > > > > All the files I have bundled up and put on wildfire in a file called > > sfm.rar > > > > http://www.mediafire.com/?xdbiu55a71ucjb2 > > > > If anyone has any pointers, I'd love to hear. Otherwise, I might be > > quite some time turning what could be a great learning resource into > > something usable. > > -- > You received this message because you are subscribed to the Google Groups > "mnemosyne-proj-users" group. > To post to this group, send email to [email protected] > . > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/mnemosyne-proj-users?hl=en. > > -- You received this message because you are subscribed to the Google Groups "mnemosyne-proj-users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/mnemosyne-proj-users?hl=en.
