I do this sort of thing all the time.  I don't use awk (but rather, perl,
ruby, or groovy; depending on my mood), but I agree with Peter that a:

question<TAB>answer

format is pretty easy.



On Thu, Oct 13, 2011 at 5:33 AM, Peter Bienstman
<[email protected]>wrote:

> It's probably easier to convert it to tab delimited txt format encoded
> in utf8. That way you don't have to deal with XML. If I had to do the
> job myself, I'd use Python as opposed to awk, but that's just my
> personal preference :-)
>
> Peter
>
> On Oct 12, 8:53 pm, daveoily <[email protected]> wrote:
> > Hi all, I'm studying Japanese, if you are too, you might have heard of
> > smartfm, they were brilliant, but then decided to make people pay for
> > it, it's fair enough I suppose, but I haven't the money. So I spent a
> > few hours in the days before it went to being a paysite downloading
> > all the stuff I could, example sentences and the pages with
> > information about the translations and pronunciation.
> >
> > It occurs to me that I could make mnemosyne cards from this bunch of
> > information, but doing it manually would take me an age, eating into
> > precious study time. I'm currently working on a whole stack of other
> > cards to be uploaded upon completion anyway.
> >
> > The way I see it, is if I can strip the pertinent information from the
> > html and put it into the right format for a mnemosyne xml file, I
> > could automate the process to such an extent that it would take
> > seconds to create the cards, IF and it is a big if, I knew how to do
> > it!
> >
> > I've had a brush with AWK before, and I think it might be the right
> > tool for such a job, but I'm no expert to put it mildly, and would
> > really appreciate some help with this one if some knowledgeable soul
> > could see how to do it.
> >
> > There's hundreds, perhaps thousands of example sentences in mp3
> > format, they're the files named JS******.mp3, and also words (I'm not
> > sure if I got them all, but they may well be there named JW******.mp3
> >
> > All the files I have bundled up and put on wildfire in a file called
> > sfm.rar
> >
> > http://www.mediafire.com/?xdbiu55a71ucjb2
> >
> > If anyone has any pointers, I'd love to hear. Otherwise, I might be
> > quite some time turning what could be a great learning resource into
> > something usable.
>
> --
> You received this message because you are subscribed to the Google Groups
> "mnemosyne-proj-users" group.
> To post to this group, send email to [email protected]
> .
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/mnemosyne-proj-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mnemosyne-proj-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mnemosyne-proj-users?hl=en.

Reply via email to