Re: [Ldsoss] Script to download conference mp3s....

m h Mon, 08 Oct 2007 09:55:43 -0700

I've updated my script so that there are no longer dependencies on
BeautifulSoup.  Just one script and you can download all the mp3s for
conference.  It will run on most recent unix derived OS's (windows
users will need to download the only dependency, python).


Enjoy

-matt

ps. note that sunday afternoon still isn't up on
http://www.lds.org/conference/sessions/display/0,5239,49-1-775,00.html

On 8/20/07, Cathy Malmrose <[EMAIL PROTECTED]> wrote:
> Wow, that is so nice. Thank you for doing that.
>
> --Cathy Malmrose, CEO ZaReason, Inc. (Mormon family building Linux systems)
> www.zareason.com
>
>
>  On 8/19/07, m h <[EMAIL PROTECTED]> wrote:
> >
> > My wife recently asked for all the recent conference mp3s, so I
> > whipped out a little script to do that in python.  You point it at a
> > lds.org conference url and it will pull all the individual talks
> > (skipping the complete session ones) into a specified directory.
> >
> > I thought I'd share it in case anyone cared.
> >
> > enjoy,
> >
> > matt
> >
> > _______________________________________________
> > Ldsoss mailing list
> > [email protected]
> > http://lists.ldsoss.org/mailman/listinfo/ldsoss
> >
> >
> >
>
>
> _______________________________________________
> Ldsoss mailing list
> [email protected]
> http://lists.ldsoss.org/mailman/listinfo/ldsoss
>
>

"""
script to download all mp3 sessions from a given conference url

example use::

  python downloadconference.py -u http://lds.org/conference/sessions/display/0,5239,23-1-690,00.html -d /tmp/conf 

Only requirement is BeautifulSoup module.

Licensed under PSF license.

Copyright 2007 - matt harrison
"""

import urllib2
import logging
import optparse
import sys
import os

logging.basicConfig(filename="log.txt", level=logging.DEBUG)


def get_contents(url):
    page = urllib2.urlopen(url)
    return page


def get_link_iter(url):
    from BeautifulSoup import BeautifulSoup
    html_page = get_contents(url)
    soup = BeautifulSoup(html_page)
    links = soup.findAll("a")
    for link in links:
        yield link

def get_mp3_iter_no_bs(url):
    """
    remove dep for beautiful soup...
    """
    html_page = get_contents(url)
    for line in html_page:
        start = 0
        done = False
        while not done:
            loc = line.find('a href="', start)
            if loc == -1:
                done = True
            else:
                loc = loc + len('a href="') 
                end = line.find('"', loc)
                link = line[loc:end]
                if is_mp3(link):
                    yield link
                start = end


def get_mp3_iter(url):
    for link in get_link_iter(url):
        href = link["href"]
        if is_mp3(href):
            yield href


def is_mp3(href):
    #filter out "Complete sessions"
    return href.endswith(".mp3") and "Complete" not in href


def copy_mp3s_to_dir(url, dest_dir):
    if not os.path.isdir(dest_dir):
        os.makedirs(dest_dir)
        
    for mp3_url in get_mp3_iter_no_bs(url):
        download_mp3(mp3_url,dest_dir)

def download_mp3(mp3_url,dest_dir):
    #download mp3
    logging.info("Downloading %s" % mp3_url)
    fin = urllib2.urlopen(mp3_url)
    mp3 = fin.read()
    fin.close()
    filename = get_filename(mp3_url)
    #copy to dest
    dest = os.path.join(dest_dir, filename)
    logging.info("Writing to %s" %dest)
    fout = open(dest, 'w')
    fout.write(mp3)
    fout.close()
    logging.info("Done")

def get_filename(url):
    """strip off last part of url for filename"""
    return url.split("/")[-1]
        

def main(args=None):
    if args is None:
        args = sys.argv

    p = optparse.OptionParser()
    p.add_option("-u", "--url", action="store", dest="url",
                 help="specify url to download mp3s from")
    p.add_option("-d", "--destination-directory", action="store",
                 dest="dest", help="directory in which to place mp3s")

    opt, args = p.parse_args(args)

    if opt.dest and opt.url:
        copy_mp3s_to_dir(opt.url, opt.dest)
    else:
        print p.usage

if __name__ == "__main__":
    main()

_______________________________________________
Ldsoss mailing list
[email protected]
http://lists.ldsoss.org/mailman/listinfo/ldsoss

Re: [Ldsoss] Script to download conference mp3s....

Reply via email to