Hello together,

maybe this is interesting for other german users.

My husband was looking araound for a good source for EPG data lately
and came up with www.epgdata.com.
They offer epg data in a xml format for download, but one needs a PIN
for that. Their business idea seems to be to sell the epg data to business customers who want to itegrate it in their commericial soft/hardware.

We sent them a mail, expressing our interest in their service and
asking how one can obtain such a PIN and how much that would cost.

The answer was that the service is free of charge at the moment
as it is still in a testing phase. A valid PIN was included.
They did not inform us, how long this testing will be, but we are using
their service now for two months and are quite satisfied with there
epg data, mainly because it contains good descriptions for many programs.

I created a grabber programm for their data. It downloads the epgdata files and converts them to xmltv format so that freevo can use them.
The usage of this programm is very similar to the other xmltv-grabber
programms (but written in python ;-)).

If someone is interested, feel free to use this, but I do not garantee for nothing. (Programm attached.)


Grüße
Tanja
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#-------------------------------------------------------------------------------
# Copyright (C) 2006 Tanja Kotthaus
#
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MER-
# CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
# 
#------------------------------------------------------------------------------
import sys
import os
import glob
import codecs

from optparse import OptionParser
from xml.sax import saxutils
from xml.sax import make_parser
from xml.sax.handler import feature_namespaces


class configHandler(saxutils.DefaultHandler):
    """ Creates a configuration file
    
    The list of possible channels is read from the channel_*.xml files
    in the include.zip file from epgdata. After that the user is asked
    for each channel if it is interesting or if it should be ignored.
    The user's choice it written to a config file.
    """

    def __init__(self, outfile):
        # config file for saving the users choice of channels
        self.config = codecs.open(outfile,'w','utf-8')
        # list of channels (as dictionary, double entries will be ignored)
        self.channels = {}
        # data for a single channel
        self.channel = {} 
        # flag which indicates the tag of the currently processed element
        self.tag=None   


    def write(self): 
        """ Write to file
        
        The list of possible channels is read. In this function the user will
        be asked which channels are interesting and then the config file will
        be written from that choice.
        """
        # get the config file
        config = self.config
        
        # write the header
        config.write('<?xml version="1.0" encoding="UTF-8"?> \n')
        config.write('<config source-info-url="http://www.epgdata.com/";> \n')   
        config.write('<channels>\n')
        
        # Start asking the user which channels are interesting
        ask = True
        while len(self.channels)>0:
            # get the next channel from the list
            (chid, chname) = self.channels.popitem()
            while True and ask:
                question ='Add channel %s? [yes,no,none,all (default=no)]' %chname[1] 
                # get the input from the user
                res = raw_input(question.encode(sys.stdout.encoding))
                # convert it to lower capitals
                res = res.lower()
                # check the input
                if res in ['yes', 'no','none','all','']:
                    break
                else:
                    # invalid input, ask again!
                    print 'Invalid response, please choose one of yes, no, none or all'    
            
            # deal with the answer
            if res == 'no' or res=='':
                # this channel should be ignored, continue with next channel
                continue
            if res =='none':
                # stop asking, we are finished
                break
            if res =='all':
                # stop asking stupid questions, take all from now on
                ask = False 
            # write the entry for this channel to the config file    
            config.write('<ch id="%s" name="%s">%s</ch>\n' 
                             %(chid, chname[1],chname[0]))
        
        # close the file        
        config.write('</channels>\n')
        config.write('</config>')
        config.close()    
        
        
    def startElement(self, name, attrs):
        """ startElement function for SAX
        
        We use the SAX framework for parsing the xml file. In this framework
        this function defines the actions that should be taken, if the opening
        tag of a element is encountered during parsing.
        The variable name represents the name of the element and we can decide
        to act on special elements and ignore others.
        This function deals with the following elements:
        <data>  parent element of a channel
        <ch0>   fullname of a channel
        <ch1>   short name of a channel
        <ch4>   internal id of a channel as it is used in the epg data
        """   
        if name == 'data':
            # new channel
            self.tag = None 
            self.channel.clear() 
        if name =='ch0':
            # channel long name
            self.tag = u'fullname'
            self.channel[u'fullname'] = u''
        if name == 'ch1':
            # channel name
            self.tag = u'name'
            self.channel[u'name'] = u''
        if name == 'ch4':
            # channel id
            self.tag = u'id'
            self.channel[u'id'] = u''     
        
    
    def characters(self, ch):
        """ characters function for SAX
        
        The characters() method is called for characters that aren't inside 
        XML tags.
        """ 
        if self.tag:
            # add the characters to the appropriate data
            self.channel[self.tag] += ch   


    def endElement(self,name):
        """ endElement function for SAX
        
        This methode is called whenever a closing tag is encountered during
        parsing. The variable 'name' represents the name of that element.
        If the closed element is a data element, then all data for the currently
        processed channel is now collected and the channel can be added to
        the list of available channels.
        """
        if name == 'data':
            # end of information for one channel    
            channelname = self.channel[u'name']
            channelfullname = self.channel[u'fullname']
            channelid = self.channel[u'id']
            # let's fill the informations into the channel list
            self.channels[channelid] = (channelname, channelfullname)    
        else:
            # just clear the tag  
            self.tag = None
            return
        

class channelHandler(saxutils.DefaultHandler):
    """ Get the list of channels from the config file
    
    This reads the list of interesting channels from our own config file,
    which is also a xml file, which <ch> elements for each channel.
    The id of a channel and its sho‏rt name are attributes to that elements
    and the long name is the content of such a element.
    """     
    def __init__(self):
        # list of channels
        self.channels={}
        # long name of a channel
        self.channel=''
        self.inside=False


    def startElement(self, name, attrs):
        """ startElement function for SAX.
        
        This will be called whenever we enter a <ch> element during parsing.
        Then the attributes will be extracted.
        """
        if name == 'ch':
            self.inside = True
            self.channel = ''
            # internal id 
            self.chid = attrs.get('id', None)
            # short name 
            self.name= attrs.get('name',None)
    
    
    def characters(self, ch):
        """ characters function for SAX
        
        The characters() method is called for characters that aren't inside 
        XML tags. Here it is used to collect the characters that form the long
        name of a channel.
        """ 
        if self.inside:
            self.channel += ch   


    def endElement(self,name):
        """ endElement function for SAX
        
        This is called when a closing tag is encountered during parsing.
        In this case it saves the collected data for a channel to the channels 
        list.
        """
        if name=='ch':
            self.channels[self.chid]=(self.channel, self.name)   
        
        

class docHandler(saxutils.DefaultHandler):
    """ Handels the epg data files from epgdata.com
    
    EPGdata.com stores it epg data in a xml format which is
    different from the xmltv format. The main element is a called
    <pack>, it contains the <data> elements. Each <data> element
    corresponds to a programm. The <data> element contains a lot
    of elements of the form <d?>, where ? is a number.
    The meaning of this tags can be found in the file 'qe.dtd'
    which is included in the zip file with the epgdata.
    """
    
    def __init__(self, outfile, conffile):
        # output file, writeable and with utf-8 encoding
        self.tv = codecs.open(outfile,'w','utf-8')
        # path to icon directory, not used!
        self.prefix = '/media/share/tv/'
        # tempory storage for the program data
        self.programme = {}    
        # flag which indicates the tag that we are currently processing
        self.tag = None
        # list of channels that are interesting
        self.channels = self.readConfig(conffile)
        #  write the header of the outfile
        self.writeHeader()
        # this list of categories is from the file genre.xml which is inside the
        # include.zip file that one can download from epgdata.com
        self.categories ={ '101':u'Film',
                           '102':u'Abenteuerfilm',
                           '103':u'Actionfilm',
                           '104':u'Dokumentarfilm',
                           '105':u'Drama',
                           '106':u'Erotikfilm',
                           '108':u'Fantasyfilm',
                           '109':u'Heimatfilm',
                           '110':u'Komödie',
                           '112':u'Krimi',
                           '113':u'Film, Kultur',
                           '114':u'Kurzfilm',
                           '115':u'Musikfilm',
                           '116':u'Mystery+Horror-film',
                           '117':u'Romance',
                           '119':u'Science-Fiction-Film',
                           '121':u'Thriller',      
                           '122':u'Western',
                           '123':u'Zeichentrickfilm', 
                           '201':u'Serie',
                           '202':u'Abenteuer-Serie',
                           '203':u'Action-Serie',
                           '205':u'Serie, Drama',
                           '206':u'Erotik-Serie',
                           '207':u'Familienserie',
                           '208':u'Fantasy-Serie',
                           '210':u'Comedy-Serie',
                           '211':u'Krankenhaus-Serie',
                           '212':u'Krimi-Serie',
                           '214':u'Jugendserie',
                           '216':u'Mystery+Horror-Serie',
                           '218':u'Reality-Serie',
                           '219':u'Science-Fiction-Serie',
                           '220':u'Soap',
                           '221':u'Serie, Thriller',
                           '222':u'Western-Serie',
                           '223':u'Zeichentrick-Serie',
                           '301':u'Sport',
                           '331':u'Boxen',
                           '332':u'Eishockey',
                           '334':u'Fussball',
                           '335':u'Olympia',
                           '336':u'Golf',
                           '337':u'Gymnastik',
                           '338':u'Handball',
                           '339':u'Motorsport',
                           '340':u'Radsport',
                           '341':u'Tennis',
                           '342':u'Wassersport',
                           '343':u'Wintersport',
                           '344':u'US-Sport',
                           '345':u'Leichtathletik',
                           '346':u'Volleyball',
                           '347':u'Extremsport',
                           '348':u'Sportreportage',
                           '401':u'Show',
                           '406':u'Erotik-Show',
                           '418':u'Reality-Show',
                           '450':u'Comedy-Show',
                           '451':u'Familien-Show',
                           '452':u'Spielshow',
                           '453':u'Talkshows',
                           '454':u'Gerichtsshow',
                           '455':u'Homeshopping',
                           '456':u'Kochshow',
                           '457':u'Heimwerker-Show',
                           '501':u'Info',
                           '560':u'Geschichte',
                           '561':u'Magazin',
                           '564':u'Gesundheit',
                           '565':u'Motor+Verkehr',
                           '566':u'Nachrichten',
                           '567':u'Natur',
                           '568':u'Politik',
                           '569':u'Ratgeber',
                           '570':u'Reise',
                           '571':u'Wirtschaft',
                           '572':u'Wissen',
                           '573':u'Dokumentation',
                           '601':u'Kultur',
                           '680':u'Jazz',
                           '681':u'Klassik',
                           '682':u'Musical',
                           '683':u'Rock',
                           '684':u'Volksmusik',
                           '685':u'Alternative',
                           '686':u'Pop',
                           '687':u'Clips',
                           '688':u'Show',
                           '689':u'Interview',
                           '690':u'Theater',
                           '691':u'Kino',
                           '692':u'Kultur',      
                           '701':u'Kindersendung',
                           '790':u'Kinderfilm',
                           '791':u'Nachrichten für Kinder',
                           '792':u'Kinderserie',
                           '793':u'Show für Kinder',
                           '795':u'Zeichentrick',
                           '796':u'Anime'
                          }                           
            
    
    def readConfig(self, conffile):
        """ Read the users config file
        
        The users config file should be a xml file where all
        channels are listed that we should fetch programm infos for
        """ 
        # Create a parser   
        parser = make_parser()

        # Tell the parser we are not interested in XML namespaces
        parser.setFeature(feature_namespaces, 0)

        # create a handler
        dh = channelHandler()

        # Tell the parser to use our handler
        parser.setContentHandler(dh)

        # Parse the input
        parser.parse(conffile)
        
        # return the list of channels from the config file
        return dh.channels
        
     
    def writeHeader(self):
        """ Write the header of the output file
        
        The output file is in the xml format that is used by xmltv.
        Therefor it starts with a list of channels.
        """
        tv = self.tv
        # write xml header info
        tv.write('<?xml version="1.0" encoding="UTF-8"?> \n')
        tv.write('<!DOCTYPE tv SYSTEM "xmltv.dtd"> \n')
        tv.write('<tv source-info-url="http://www.epgdata.com/";> \n')
        # next write the list of channels
        for items in self.channels.values():
            tv.write('<channel id="%s">\n' %items[0])
            tv.write('<display-name lange="de">%s</display-name>\n' %items[1])
            tv.write('</channel>\n')        
 
 
    def write(self):
        """ write to file
       
        actually this writes the closing tag and then closes the 
        output file. It is called 'write' to be compatible with configHandler.
        """
        tv = self.tv
        # write closing tag
        tv.write('</tv>')
        # and close the file
        tv.close()

    
    def convertTime(self, time):
        """ Convert the time to xmltv format
        
        The time format in the xml files that epgdata.com provides
        is a little different from the one that xmltv uses. There for
        we need this conversion.
        """
        time = time.replace('-','')
        time = time.replace(' ','')
        time = time.replace(':','')
        time = time + ' +0100'
        return time


    def startElement(self, name, attrs):
        """ Start Element
        
        We use SAX to parse the xml. In this framework the startElement
        function is called everytime the opening tag of a new element is reached
        This function handels all relevant tags and ignores the others.
        d2: channel
        d4: start time
        d5: stop time
        d25: category
        d19: title
        d20: subtitle
        d21: description
        d32: country
        d33: date
        d34: presenter
        d36: director
        d37: actor
        d40: picture
        Some of this tags can be empty.
        """
        if name == 'data':
            # new programme
            self.programme.clear()
            self.tag = None             
        if name == 'd2':
            # channel
            self.tag = u'channel'
            self.programme[u'channel'] = u''
        if name == 'd4':
            # starttime
            self.tag = u'start'
            self.programme[u'start'] = u''
        if name == 'd5':
            # endtime
            self.tag = u'stop'
            self.programme[u'stop'] = u''
        if name == 'd25':
            # category
            self.tag = u'category'   
            self.programme[u'category'] = u''
        if name == 'd19':
            # title
            self.tag = u'title'
            self.programme[u'title'] = u''
        if name == 'd20':
            # subtitle
            self.tag = u'subtitle'
            self.programme[u'subtitle'] = u''
        if name == 'd21':
            # descriptiom
            self.tag = u'desc'
            self.programme[u'desc'] = u''
        if name == 'd32':
            # country
            self.tag = u'country'
            self.programme[u'country'] = u''
        if name == 'd33':
            # year
            self.tag = u'date'
            self.programme['date'] = u''
        if name == 'd34':
            # moderator
            self.tag = u'presenter'
            self.programme[u'presenter'] = u''
        if name == 'd36':
            # director
            self.tag = u'director'
            self.programme[u'director'] = u''
        if name == 'd37':
            # actor
            self.tag = u'actor'
            self.programme[u'actor'] = u''
        if name == 'd40':
            # image
            self.tag = u'icon'
            self.programme[u'icon'] = u''
            
            
    def characters(self, ch):
        """ Reading characters
        
        The characters method is called for characters that aren't inside 
        XML tags. This is also required by SAX.
        """ 
        if self.tag:
            self.programme[self.tag] += ch

    
    def endElement(self,name):
        """ End of element
        
        This function defines the actions that should  be taken when
        the parser comes to a closing tag.
        """
        if name=='pack':
            # this is the end of the epg document
            return
        if not name=='data':
            # end of epg data
            self.tag = None
            return
         
        # process the data that we got from the xml file    
        tv = self.tv
        prog = self.programme
        # start time
        s = self.convertTime(prog.pop(u'start').strip())
        # stop time
        e = self.convertTime(prog.pop(u'stop').strip())
        # channel
        c = prog.pop(u'channel').strip()
        try:
            # test if this is one of the interesting channels
            c = self.channels[c][0]
        except KeyError:
            # if this channel is not interesting then we ignore it
            return
        # opening tag with attributes    
        tv.write(u'<programme start="%s" stop="%s" channel="%s">\n' %(s,e,c))
        credits = {}
        # collect the rest of the available information
        while len(prog)>0:
            # continue until all infos are processed
            tag, content = prog.popitem()
            # replace offending characters
            content = content.strip().replace('&','+')
            if len(content)==0:
                # ignore empty entries
                continue    
            if tag=='category':
                try:
                    # test of this code can be translated to a real category
                    content = self.categories[content]
                except KeyError:
                    # else it is meaningless and we ignore it
                    continue     
            if tag in (u'presenter', u'guest', u'director', u'actor'):
                # this all belongs inside a credits tag
                credits[tag] = content
            elif tag==u'icon':
                # there might be a thumbnail for this program
                tv.write(u'<icon src="'+self.prefix+content+'"/> \n' )
            else:
                # else this is not special, just write the corresponding line
                tv.write(u'<%s lang="de">%s</%s> \n' %(tag, content, tag))    
        if len(credits)>0:
            # process credits information
            tv.write(u'<credits> \n')
            while len(credits)>0:
                tag, content = credits.popitem()
                tv.write(u'<%s lang="de">%s</%s> \n'%(tag, content, tag))  
            tv.write(u'</credits> \n')
        # close the element for this programme.    
        tv.write(u'</programme> \n')


def main():
    """ main function!
    
    here it comes all together.
    First we parse the command line arguments and then we either create a
    config file or start fetching the epg data and convert it to xmltv 
    format.
    """
    

    # configuration of the command line parsing
    usage = 'usage: %prog [options]'
    version = '%prog 0.2'
    parser = OptionParser(usage=usage, version=version) 
    parser.add_option('--days', type='int', dest='days', 
           metavar = 'N', default =1, 
           help='Get data for N days [default: %default]')
    parser.add_option('--offset', type="int", dest="offset",
           metavar = 'N', default=0,
           help='Start N days in the future. [default:%default (today)]')
    parser.add_option('--output', type='str', dest='outfile', 
           metavar='FILE' ,default='TV.xml',
           help='File to write output to [default: %default]')
    parser.add_option('--pin', type='str',
           help='PIN from epgdata.com')                         
    parser.add_option('--configure', action='store_true', 
           dest='configure', default=False,   
           help='Select the channels programme data should be fetched for') 
    parser.add_option('--config-file', type=str, dest='configfile',
           metavar='FILE',help='Load/Store configuration in an alternate file')   
           
    (options, args) = parser.parse_args()
    
    # get the name of this program
    progname = parser.get_prog_name()
    # get current directory
    curdir = os.getcwd()
        
    # without PIN we can not do nothing!
    if not options.pin:
        sys.exit(progname+': '+'You need a PIN from epgdata.com.')
    pin = options.pin
    
    # create a tempdir as working area
    tempdir = '/tmp/epgdata'
    if not os.path.isdir(tempdir):
        os.mkdir(tempdir)
    os.chdir(tempdir) 
    # and clear it if needed
    for i in glob.glob('*'):       
        os.remove(i) 
    
    # let's get the name of the config file    
    if options.configfile:
        # the user provides a special config file
        conffile = options.configfile
        conffile = os.path.expanduser(conffile)
        if not os.path.isabs(conffile):
            conffile = curdir + os.sep + conffile
        confdir = os.path.dirname(conffile)    
    else:
        # or we use the default one
        conffile = progname+'.conf'
        confdir = os.environ['HOME']+'/.xmltv/'
        conffile = confdir+conffile
    
    configure = options.configure
    if configure:
        # if this is a configure run
                
        print 'Using config filename %s' %conffile

        if os.path.isfile(conffile):
            # there is already a config file
            while True:
                # ask the user what to do 
                msg = 'A nonempty configuration file %s alread exists' %(conffile)
                print msg
                msg ='Do you wish to overwrite the old configuration?' 
                msg+='[yes, no(default=no)]'
                # get the user's input 
                res = raw_input(msg).lower()
                # and check the input
                if res in ['yes', 'no','']:
                    # input was valid, thus we can continue
                    break
                # else we ask again     
                print  '\nInvalid response, please choose either yes or no:\n'
            # deal with a negativ answer
            if not res == 'yes':
                sys.exit('Exiting')
        
        # create confdir if it does not exist
        if not os.path.isdir(confdir):
            os.mkdir(confdir)
        
        # create download adresse for meta data
        addresse = 'http://www.epgdata.com/index.php'
        addresse+= '?action=sendInclude&iLang=de&iOEM=xml&iCountry=de'
        addresse+= '&pin=%s' %pin
        addresse+= '&dataType=xml'    

    
        # get the files
        print progname+': Downloading meta data'
        exit = os.system('wget -N -O temp.zip "%s"' %(addresse))
        if not exit==0:
            sys.exit(progname+': Cannot get file from epgdata.com')
        # unzip the file
        print progname+': Unzipping meta data' 
        exit = os.system('unzip -uo temp.zip')
        if not exit==0:
            sys.exit(progname+': Cannot unzip the downloaded file') 
        
        infiles = glob.glob('channel*.xml')   
 

        # Create the handler
        dh = configHandler(conffile)
 
    else:
        # this is not a configure run, let's fetch the epg data
        if not os.path.isfile(conffile):
            # config file is missing, the user must do a configure run first
            print progname+': Config file %s does not exist' %(conffile)
            print progname+': Use --configure'
            sys.exit('Exiting')
        
        # if this is not a configure run but a normal one
        outfile = options.outfile
        outfile = os.path.expanduser(outfile)
        if not os.path.isabs(outfile):
            outfile = curdir+os.sep+outfile
        print "$$$", outfile
        offset = options.offset
        days = range(offset,options.days)
    
        # create download adresse epg data
        addresse = 'http://www.epgdata.com/index.php'
        addresse+= '?action=sendPackage&iLang=de&iOEM=xml&iCountry=de'
        addresse+= '&pin=%s' %pin
        addresse+= '&dayOffset=%s&dataType=xml'    
       
        # get the file for each day
        for i in days:
            print progname+': Getting data for day %s' %(i+1-offset)
            exit = os.system('wget -N -O temp.zip "%s"' %(addresse %i))
            if not exit==0:
                sys.exit(progname+': Cannot get file from epgdata.com')
            print progname+': Unzipping data for day %s' %(i+1-offset)
            exit = os.system('unzip -uo temp.zip')
            if not exit==0:
                sys.exit(progname+': Cannot unzip the downloaded file') 
        
        infiles = glob.glob('*.xml')          
     
        # Create the handler
        dh = docHandler(outfile, conffile)
 
        
    print progname+': Start converting'    

    # Create a parser   
    parser = make_parser()

    # Tell the parser we are not interested in XML namespaces
    parser.setFeature(feature_namespaces, 0)


    # Tell the parser to use our handler
    parser.setContentHandler(dh)

    for i in infiles:
        # Parse the input
        parser.parse(i)
    dh.write()
    
    print progname+': Finished!'
 

if __name__=='__main__':
    main() 


        

    
    
    
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Freevo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freevo-users

Reply via email to