Hello together,
maybe this is interesting for other german users.
My husband was looking araound for a good source for EPG data lately
and came up with www.epgdata.com.
They offer epg data in a xml format for download, but one needs a PIN
for that. Their business idea seems to be to sell the epg data to
business customers who want to itegrate it in their commericial
soft/hardware.
We sent them a mail, expressing our interest in their service and
asking how one can obtain such a PIN and how much that would cost.
The answer was that the service is free of charge at the moment
as it is still in a testing phase. A valid PIN was included.
They did not inform us, how long this testing will be, but we are using
their service now for two months and are quite satisfied with there
epg data, mainly because it contains good descriptions for many programs.
I created a grabber programm for their data. It downloads the epgdata
files and converts them to xmltv format so that freevo can use them.
The usage of this programm is very similar to the other xmltv-grabber
programms (but written in python ;-)).
If someone is interested, feel free to use this, but I do not garantee
for nothing. (Programm attached.)
Grüße
Tanja
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#-------------------------------------------------------------------------------
# Copyright (C) 2006 Tanja Kotthaus
#
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MER-
# CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
#
#------------------------------------------------------------------------------
import sys
import os
import glob
import codecs
from optparse import OptionParser
from xml.sax import saxutils
from xml.sax import make_parser
from xml.sax.handler import feature_namespaces
class configHandler(saxutils.DefaultHandler):
""" Creates a configuration file
The list of possible channels is read from the channel_*.xml files
in the include.zip file from epgdata. After that the user is asked
for each channel if it is interesting or if it should be ignored.
The user's choice it written to a config file.
"""
def __init__(self, outfile):
# config file for saving the users choice of channels
self.config = codecs.open(outfile,'w','utf-8')
# list of channels (as dictionary, double entries will be ignored)
self.channels = {}
# data for a single channel
self.channel = {}
# flag which indicates the tag of the currently processed element
self.tag=None
def write(self):
""" Write to file
The list of possible channels is read. In this function the user will
be asked which channels are interesting and then the config file will
be written from that choice.
"""
# get the config file
config = self.config
# write the header
config.write('<?xml version="1.0" encoding="UTF-8"?> \n')
config.write('<config source-info-url="http://www.epgdata.com/"> \n')
config.write('<channels>\n')
# Start asking the user which channels are interesting
ask = True
while len(self.channels)>0:
# get the next channel from the list
(chid, chname) = self.channels.popitem()
while True and ask:
question ='Add channel %s? [yes,no,none,all (default=no)]' %chname[1]
# get the input from the user
res = raw_input(question.encode(sys.stdout.encoding))
# convert it to lower capitals
res = res.lower()
# check the input
if res in ['yes', 'no','none','all','']:
break
else:
# invalid input, ask again!
print 'Invalid response, please choose one of yes, no, none or all'
# deal with the answer
if res == 'no' or res=='':
# this channel should be ignored, continue with next channel
continue
if res =='none':
# stop asking, we are finished
break
if res =='all':
# stop asking stupid questions, take all from now on
ask = False
# write the entry for this channel to the config file
config.write('<ch id="%s" name="%s">%s</ch>\n'
%(chid, chname[1],chname[0]))
# close the file
config.write('</channels>\n')
config.write('</config>')
config.close()
def startElement(self, name, attrs):
""" startElement function for SAX
We use the SAX framework for parsing the xml file. In this framework
this function defines the actions that should be taken, if the opening
tag of a element is encountered during parsing.
The variable name represents the name of the element and we can decide
to act on special elements and ignore others.
This function deals with the following elements:
<data> parent element of a channel
<ch0> fullname of a channel
<ch1> short name of a channel
<ch4> internal id of a channel as it is used in the epg data
"""
if name == 'data':
# new channel
self.tag = None
self.channel.clear()
if name =='ch0':
# channel long name
self.tag = u'fullname'
self.channel[u'fullname'] = u''
if name == 'ch1':
# channel name
self.tag = u'name'
self.channel[u'name'] = u''
if name == 'ch4':
# channel id
self.tag = u'id'
self.channel[u'id'] = u''
def characters(self, ch):
""" characters function for SAX
The characters() method is called for characters that aren't inside
XML tags.
"""
if self.tag:
# add the characters to the appropriate data
self.channel[self.tag] += ch
def endElement(self,name):
""" endElement function for SAX
This methode is called whenever a closing tag is encountered during
parsing. The variable 'name' represents the name of that element.
If the closed element is a data element, then all data for the currently
processed channel is now collected and the channel can be added to
the list of available channels.
"""
if name == 'data':
# end of information for one channel
channelname = self.channel[u'name']
channelfullname = self.channel[u'fullname']
channelid = self.channel[u'id']
# let's fill the informations into the channel list
self.channels[channelid] = (channelname, channelfullname)
else:
# just clear the tag
self.tag = None
return
class channelHandler(saxutils.DefaultHandler):
""" Get the list of channels from the config file
This reads the list of interesting channels from our own config file,
which is also a xml file, which <ch> elements for each channel.
The id of a channel and its shoârt name are attributes to that elements
and the long name is the content of such a element.
"""
def __init__(self):
# list of channels
self.channels={}
# long name of a channel
self.channel=''
self.inside=False
def startElement(self, name, attrs):
""" startElement function for SAX.
This will be called whenever we enter a <ch> element during parsing.
Then the attributes will be extracted.
"""
if name == 'ch':
self.inside = True
self.channel = ''
# internal id
self.chid = attrs.get('id', None)
# short name
self.name= attrs.get('name',None)
def characters(self, ch):
""" characters function for SAX
The characters() method is called for characters that aren't inside
XML tags. Here it is used to collect the characters that form the long
name of a channel.
"""
if self.inside:
self.channel += ch
def endElement(self,name):
""" endElement function for SAX
This is called when a closing tag is encountered during parsing.
In this case it saves the collected data for a channel to the channels
list.
"""
if name=='ch':
self.channels[self.chid]=(self.channel, self.name)
class docHandler(saxutils.DefaultHandler):
""" Handels the epg data files from epgdata.com
EPGdata.com stores it epg data in a xml format which is
different from the xmltv format. The main element is a called
<pack>, it contains the <data> elements. Each <data> element
corresponds to a programm. The <data> element contains a lot
of elements of the form <d?>, where ? is a number.
The meaning of this tags can be found in the file 'qe.dtd'
which is included in the zip file with the epgdata.
"""
def __init__(self, outfile, conffile):
# output file, writeable and with utf-8 encoding
self.tv = codecs.open(outfile,'w','utf-8')
# path to icon directory, not used!
self.prefix = '/media/share/tv/'
# tempory storage for the program data
self.programme = {}
# flag which indicates the tag that we are currently processing
self.tag = None
# list of channels that are interesting
self.channels = self.readConfig(conffile)
# write the header of the outfile
self.writeHeader()
# this list of categories is from the file genre.xml which is inside the
# include.zip file that one can download from epgdata.com
self.categories ={ '101':u'Film',
'102':u'Abenteuerfilm',
'103':u'Actionfilm',
'104':u'Dokumentarfilm',
'105':u'Drama',
'106':u'Erotikfilm',
'108':u'Fantasyfilm',
'109':u'Heimatfilm',
'110':u'Komödie',
'112':u'Krimi',
'113':u'Film, Kultur',
'114':u'Kurzfilm',
'115':u'Musikfilm',
'116':u'Mystery+Horror-film',
'117':u'Romance',
'119':u'Science-Fiction-Film',
'121':u'Thriller',
'122':u'Western',
'123':u'Zeichentrickfilm',
'201':u'Serie',
'202':u'Abenteuer-Serie',
'203':u'Action-Serie',
'205':u'Serie, Drama',
'206':u'Erotik-Serie',
'207':u'Familienserie',
'208':u'Fantasy-Serie',
'210':u'Comedy-Serie',
'211':u'Krankenhaus-Serie',
'212':u'Krimi-Serie',
'214':u'Jugendserie',
'216':u'Mystery+Horror-Serie',
'218':u'Reality-Serie',
'219':u'Science-Fiction-Serie',
'220':u'Soap',
'221':u'Serie, Thriller',
'222':u'Western-Serie',
'223':u'Zeichentrick-Serie',
'301':u'Sport',
'331':u'Boxen',
'332':u'Eishockey',
'334':u'Fussball',
'335':u'Olympia',
'336':u'Golf',
'337':u'Gymnastik',
'338':u'Handball',
'339':u'Motorsport',
'340':u'Radsport',
'341':u'Tennis',
'342':u'Wassersport',
'343':u'Wintersport',
'344':u'US-Sport',
'345':u'Leichtathletik',
'346':u'Volleyball',
'347':u'Extremsport',
'348':u'Sportreportage',
'401':u'Show',
'406':u'Erotik-Show',
'418':u'Reality-Show',
'450':u'Comedy-Show',
'451':u'Familien-Show',
'452':u'Spielshow',
'453':u'Talkshows',
'454':u'Gerichtsshow',
'455':u'Homeshopping',
'456':u'Kochshow',
'457':u'Heimwerker-Show',
'501':u'Info',
'560':u'Geschichte',
'561':u'Magazin',
'564':u'Gesundheit',
'565':u'Motor+Verkehr',
'566':u'Nachrichten',
'567':u'Natur',
'568':u'Politik',
'569':u'Ratgeber',
'570':u'Reise',
'571':u'Wirtschaft',
'572':u'Wissen',
'573':u'Dokumentation',
'601':u'Kultur',
'680':u'Jazz',
'681':u'Klassik',
'682':u'Musical',
'683':u'Rock',
'684':u'Volksmusik',
'685':u'Alternative',
'686':u'Pop',
'687':u'Clips',
'688':u'Show',
'689':u'Interview',
'690':u'Theater',
'691':u'Kino',
'692':u'Kultur',
'701':u'Kindersendung',
'790':u'Kinderfilm',
'791':u'Nachrichten für Kinder',
'792':u'Kinderserie',
'793':u'Show für Kinder',
'795':u'Zeichentrick',
'796':u'Anime'
}
def readConfig(self, conffile):
""" Read the users config file
The users config file should be a xml file where all
channels are listed that we should fetch programm infos for
"""
# Create a parser
parser = make_parser()
# Tell the parser we are not interested in XML namespaces
parser.setFeature(feature_namespaces, 0)
# create a handler
dh = channelHandler()
# Tell the parser to use our handler
parser.setContentHandler(dh)
# Parse the input
parser.parse(conffile)
# return the list of channels from the config file
return dh.channels
def writeHeader(self):
""" Write the header of the output file
The output file is in the xml format that is used by xmltv.
Therefor it starts with a list of channels.
"""
tv = self.tv
# write xml header info
tv.write('<?xml version="1.0" encoding="UTF-8"?> \n')
tv.write('<!DOCTYPE tv SYSTEM "xmltv.dtd"> \n')
tv.write('<tv source-info-url="http://www.epgdata.com/"> \n')
# next write the list of channels
for items in self.channels.values():
tv.write('<channel id="%s">\n' %items[0])
tv.write('<display-name lange="de">%s</display-name>\n' %items[1])
tv.write('</channel>\n')
def write(self):
""" write to file
actually this writes the closing tag and then closes the
output file. It is called 'write' to be compatible with configHandler.
"""
tv = self.tv
# write closing tag
tv.write('</tv>')
# and close the file
tv.close()
def convertTime(self, time):
""" Convert the time to xmltv format
The time format in the xml files that epgdata.com provides
is a little different from the one that xmltv uses. There for
we need this conversion.
"""
time = time.replace('-','')
time = time.replace(' ','')
time = time.replace(':','')
time = time + ' +0100'
return time
def startElement(self, name, attrs):
""" Start Element
We use SAX to parse the xml. In this framework the startElement
function is called everytime the opening tag of a new element is reached
This function handels all relevant tags and ignores the others.
d2: channel
d4: start time
d5: stop time
d25: category
d19: title
d20: subtitle
d21: description
d32: country
d33: date
d34: presenter
d36: director
d37: actor
d40: picture
Some of this tags can be empty.
"""
if name == 'data':
# new programme
self.programme.clear()
self.tag = None
if name == 'd2':
# channel
self.tag = u'channel'
self.programme[u'channel'] = u''
if name == 'd4':
# starttime
self.tag = u'start'
self.programme[u'start'] = u''
if name == 'd5':
# endtime
self.tag = u'stop'
self.programme[u'stop'] = u''
if name == 'd25':
# category
self.tag = u'category'
self.programme[u'category'] = u''
if name == 'd19':
# title
self.tag = u'title'
self.programme[u'title'] = u''
if name == 'd20':
# subtitle
self.tag = u'subtitle'
self.programme[u'subtitle'] = u''
if name == 'd21':
# descriptiom
self.tag = u'desc'
self.programme[u'desc'] = u''
if name == 'd32':
# country
self.tag = u'country'
self.programme[u'country'] = u''
if name == 'd33':
# year
self.tag = u'date'
self.programme['date'] = u''
if name == 'd34':
# moderator
self.tag = u'presenter'
self.programme[u'presenter'] = u''
if name == 'd36':
# director
self.tag = u'director'
self.programme[u'director'] = u''
if name == 'd37':
# actor
self.tag = u'actor'
self.programme[u'actor'] = u''
if name == 'd40':
# image
self.tag = u'icon'
self.programme[u'icon'] = u''
def characters(self, ch):
""" Reading characters
The characters method is called for characters that aren't inside
XML tags. This is also required by SAX.
"""
if self.tag:
self.programme[self.tag] += ch
def endElement(self,name):
""" End of element
This function defines the actions that should be taken when
the parser comes to a closing tag.
"""
if name=='pack':
# this is the end of the epg document
return
if not name=='data':
# end of epg data
self.tag = None
return
# process the data that we got from the xml file
tv = self.tv
prog = self.programme
# start time
s = self.convertTime(prog.pop(u'start').strip())
# stop time
e = self.convertTime(prog.pop(u'stop').strip())
# channel
c = prog.pop(u'channel').strip()
try:
# test if this is one of the interesting channels
c = self.channels[c][0]
except KeyError:
# if this channel is not interesting then we ignore it
return
# opening tag with attributes
tv.write(u'<programme start="%s" stop="%s" channel="%s">\n' %(s,e,c))
credits = {}
# collect the rest of the available information
while len(prog)>0:
# continue until all infos are processed
tag, content = prog.popitem()
# replace offending characters
content = content.strip().replace('&','+')
if len(content)==0:
# ignore empty entries
continue
if tag=='category':
try:
# test of this code can be translated to a real category
content = self.categories[content]
except KeyError:
# else it is meaningless and we ignore it
continue
if tag in (u'presenter', u'guest', u'director', u'actor'):
# this all belongs inside a credits tag
credits[tag] = content
elif tag==u'icon':
# there might be a thumbnail for this program
tv.write(u'<icon src="'+self.prefix+content+'"/> \n' )
else:
# else this is not special, just write the corresponding line
tv.write(u'<%s lang="de">%s</%s> \n' %(tag, content, tag))
if len(credits)>0:
# process credits information
tv.write(u'<credits> \n')
while len(credits)>0:
tag, content = credits.popitem()
tv.write(u'<%s lang="de">%s</%s> \n'%(tag, content, tag))
tv.write(u'</credits> \n')
# close the element for this programme.
tv.write(u'</programme> \n')
def main():
""" main function!
here it comes all together.
First we parse the command line arguments and then we either create a
config file or start fetching the epg data and convert it to xmltv
format.
"""
# configuration of the command line parsing
usage = 'usage: %prog [options]'
version = '%prog 0.2'
parser = OptionParser(usage=usage, version=version)
parser.add_option('--days', type='int', dest='days',
metavar = 'N', default =1,
help='Get data for N days [default: %default]')
parser.add_option('--offset', type="int", dest="offset",
metavar = 'N', default=0,
help='Start N days in the future. [default:%default (today)]')
parser.add_option('--output', type='str', dest='outfile',
metavar='FILE' ,default='TV.xml',
help='File to write output to [default: %default]')
parser.add_option('--pin', type='str',
help='PIN from epgdata.com')
parser.add_option('--configure', action='store_true',
dest='configure', default=False,
help='Select the channels programme data should be fetched for')
parser.add_option('--config-file', type=str, dest='configfile',
metavar='FILE',help='Load/Store configuration in an alternate file')
(options, args) = parser.parse_args()
# get the name of this program
progname = parser.get_prog_name()
# get current directory
curdir = os.getcwd()
# without PIN we can not do nothing!
if not options.pin:
sys.exit(progname+': '+'You need a PIN from epgdata.com.')
pin = options.pin
# create a tempdir as working area
tempdir = '/tmp/epgdata'
if not os.path.isdir(tempdir):
os.mkdir(tempdir)
os.chdir(tempdir)
# and clear it if needed
for i in glob.glob('*'):
os.remove(i)
# let's get the name of the config file
if options.configfile:
# the user provides a special config file
conffile = options.configfile
conffile = os.path.expanduser(conffile)
if not os.path.isabs(conffile):
conffile = curdir + os.sep + conffile
confdir = os.path.dirname(conffile)
else:
# or we use the default one
conffile = progname+'.conf'
confdir = os.environ['HOME']+'/.xmltv/'
conffile = confdir+conffile
configure = options.configure
if configure:
# if this is a configure run
print 'Using config filename %s' %conffile
if os.path.isfile(conffile):
# there is already a config file
while True:
# ask the user what to do
msg = 'A nonempty configuration file %s alread exists' %(conffile)
print msg
msg ='Do you wish to overwrite the old configuration?'
msg+='[yes, no(default=no)]'
# get the user's input
res = raw_input(msg).lower()
# and check the input
if res in ['yes', 'no','']:
# input was valid, thus we can continue
break
# else we ask again
print '\nInvalid response, please choose either yes or no:\n'
# deal with a negativ answer
if not res == 'yes':
sys.exit('Exiting')
# create confdir if it does not exist
if not os.path.isdir(confdir):
os.mkdir(confdir)
# create download adresse for meta data
addresse = 'http://www.epgdata.com/index.php'
addresse+= '?action=sendInclude&iLang=de&iOEM=xml&iCountry=de'
addresse+= '&pin=%s' %pin
addresse+= '&dataType=xml'
# get the files
print progname+': Downloading meta data'
exit = os.system('wget -N -O temp.zip "%s"' %(addresse))
if not exit==0:
sys.exit(progname+': Cannot get file from epgdata.com')
# unzip the file
print progname+': Unzipping meta data'
exit = os.system('unzip -uo temp.zip')
if not exit==0:
sys.exit(progname+': Cannot unzip the downloaded file')
infiles = glob.glob('channel*.xml')
# Create the handler
dh = configHandler(conffile)
else:
# this is not a configure run, let's fetch the epg data
if not os.path.isfile(conffile):
# config file is missing, the user must do a configure run first
print progname+': Config file %s does not exist' %(conffile)
print progname+': Use --configure'
sys.exit('Exiting')
# if this is not a configure run but a normal one
outfile = options.outfile
outfile = os.path.expanduser(outfile)
if not os.path.isabs(outfile):
outfile = curdir+os.sep+outfile
print "$$$", outfile
offset = options.offset
days = range(offset,options.days)
# create download adresse epg data
addresse = 'http://www.epgdata.com/index.php'
addresse+= '?action=sendPackage&iLang=de&iOEM=xml&iCountry=de'
addresse+= '&pin=%s' %pin
addresse+= '&dayOffset=%s&dataType=xml'
# get the file for each day
for i in days:
print progname+': Getting data for day %s' %(i+1-offset)
exit = os.system('wget -N -O temp.zip "%s"' %(addresse %i))
if not exit==0:
sys.exit(progname+': Cannot get file from epgdata.com')
print progname+': Unzipping data for day %s' %(i+1-offset)
exit = os.system('unzip -uo temp.zip')
if not exit==0:
sys.exit(progname+': Cannot unzip the downloaded file')
infiles = glob.glob('*.xml')
# Create the handler
dh = docHandler(outfile, conffile)
print progname+': Start converting'
# Create a parser
parser = make_parser()
# Tell the parser we are not interested in XML namespaces
parser.setFeature(feature_namespaces, 0)
# Tell the parser to use our handler
parser.setContentHandler(dh)
for i in infiles:
# Parse the input
parser.parse(i)
dh.write()
print progname+': Finished!'
if __name__=='__main__':
main()
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Freevo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freevo-users