Re: [Freevo-devel] Re: kaa.epg sucks

Rob Shortt Tue, 11 Oct 2005 12:02:59 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jason Tackaberry wrote:
> On Tue, 2005-10-11 at 17:09 +0200, Dirk Meyer wrote:
> 
>>We should keep some things in mind. First, your IPC code is nice but
>>not secure. Everyone can connect to the server and call python
>>functions and such things.
> 
> 
> I can add authentication.  But the data still wouldn't be encrypted, so
> in any case this solution isn't suitable for use over a public network.
> And I don't even think we should bother trying to make the IPC channel
> encrypted.  Not only would it hurt performance, but it's probably
> impossible to get it right anyway.  (And if we used m2crypto, say, our
> program would leak like crazy.)
> 
> I think it's good enough for IPC to use filesystem access control (in
> the case of unix sockets).  If the user wants to use it over the LAN I
> can add some basic challenge/response authentication to kaa's ipc.


If would be good if you can add that kind of authentication.  I don't
think encryption here is a big deal, we're talking LAN here.  If I was
in a situation where I needed encryption I'd use ipsec which is easy to
set up.


> But for purposes of epg and vfs, I agree with your basic architecture
> of: database on local machine, db reads in thread, db writes over ipc.
> For something like managing recording schedules in kaa.record, a simple
> authentication mechanism in kaa.base.ipc might do.

I think the only entry into the database should be a single process that
can be accessed through IPC or another convenient interface (only).
Now, inside this application I'm not entirely sure (or care, as long as
it works well) how it treats read / writes to the db.


>>local. We also can't use mbus because it is designed to be a message
>>bus, not a bus to transport that much data (but it is secure btw). 
> 
> 
> mbus is secure, is it?  High praise indeed.  I wouldn't use that word
> about any software. :)  Even about openvpn, which could be the best
> piece of software I use on my computer.

Well, secure is always a reltive term. :)


>>(async). The thread will not only query the db, it will also create
>>nice 'Program' objects so the main thread can use it without creating
>>something. There should also be a cache to speed up stuff by not using
>>the thread with db lookup at all.
> 
> 
> Herein lies the main benefit of doing reads in a thread.  The thread can
> also take care of putting the data in a manageable form for the main
> application.  This is important particularly since Python is a hog when
> it comes to object creation.
> 
> 
>>Freevo knows what channels would be visible when entering the tv
>>grid. So Freevo will request these channels with programs +- 6 hours
>>at startup.
> 
> 
> Using my rewrite of kaa.epg (which I've cleverly called kaa.epg2 for
> now), this takes 0.2 seconds and returns 1978 program objects.  As a
> point of interest, the query itself takes 0.02 seconds to execute,
> another 0.1 seconds to convert the rows to tuples, another 0.05 seconds
> to normalize the tuples into dicts (including unpickling ATTR_SIMPLE
> attributes), and another 0.03 seconds to convert those to dicts to
> Program objects.  So that 0.05 in normalize time is some low hanging
> fruit and would bring that query down to 0.15 seconds (on my system, at
> least).  Not slow, but I agree that it's worth prefetching.
> 
> The original kaa.epg executes that same query in 0.17 seconds.  Pretty
> comparable performance there.  Keyword searches are a different story,
> of course.  Searching for "simpsons" with kaa.epg returns 120 rows and
> takes 0.15 seconds.  With kaa.epg2 and using the keyword support in
> kaa.base.db, the same query takes 0.015 seconds.
> 
> BTW, when parsing my 17MB xmltv file, kaa.epg takes 74 minutes ([EMAIL 
> PROTECTED]
> $!#@) to execute, and uses 377MB RSS.  My rewrite (whose improvement is
> mainly due to my use of libxml2, of course) takes 94 seconds and uses
> less than half that memory.  That's a 50X performance improvement.
> About 55% of that 94 seconds is due to keyword indexing (ATTR_KEYWORDS
> attributes).  I could probably improve that time quite a bit by adding
> mass add functionality to the API.  (Sort of like the difference between
> pysqlite's execute and executemany.)
> 
> 
>>cache. When you go to the right, freevo will ask the db for data + 10
>>hours, just to be sure in case the user needs it. So we can cache in
>>the background what we thing is needed next in a thread and the main
>>loop can display stuff without using the db.
> 
> 
> Probably not a bad idea to do prefetches like that.  There's a pretty
> high initial overhead, so it's better to get more rows than you need.
> For example, querying for the next 2 hours of program data takes 0.1
> seconds and returns 200 rows.  Querying for the next 12 hours returns
> 2000 rows and takes 0.2 seconds.  10X the data for only 2X extra
> execution time.  Actually, now that I think about that, something
> doesn't seem right there.  Smells like an index isn't getting used (or
> doesn't exist).
> 
> Anyway, I agree, prefetching in another thread seems to be the way to
> go.

I also agree that fetching the data before trying to show the EPG is
best.  What I'd like to avoid is a problem we're having right now.  The
"local" cached EPG data will be incorrect if the EPG changes in the
database / server.  Will we use some callbacks for when the server updates?

Ideally I'd like to see everything done realtime, from the client, to
the server, to the database, but I'm not sure we can get the performance
there that we really want.  Is it bad to dream?  A small buffer on the
client and prefetching could hide potential lag.


>>Back to client / server. When we want to add data to the epg, we spawn
>>a child. First we check if a write child is already running (try to
>>connect to the ipc, if it doesn't work, start child, I have some test
>>code for that). Same for kaa.vfs. One reading thread in each app, one
>>writing app on each machine. 

Again I'd prefer a single point of entry...


> I think I need to change my opinion a bit about kaa.base.ipc.  My
> original thinking was that you don't need to write a client API.  You
> just grab a proxy to a remote object and use it as if it's local.  This
> works in terms of functionality, but in practice, things aren't so
> clear.  For example, in the epg example, you do a query and return a
> list of 2000 Program objects.  Since objects get proxied by default, all
> those Program objects are proxies.  So if we assume epg is a proxied,
> remote object:
> 
>     for prog in epg.search(keywords="simpsons"):
>        prog.description
> 
> That would be fairly slow, because since 'prog' is a proxied Program
> object, each access of the description attribute goes over the wire.
> Alternatively you could do this:
> 
>    for prog in epg.search(keywords="simpsons", __ipc_copy_result =
> True):
>       prog.description
> 
> That'd be fast, because each Program object is pickled (rather than just
> a reference to it), so all those objects are local.  But if Program
> object holds a reference to the epg (prog._epg in my case), if you use
> __ipc_copy_result, the epg Guide object also gets pickled.  That's not
> good.
> 
> Ideally you'd want Program objects to get pickled (so that attribute
> accesses are local), but the epg reference is to a remote object.  This
> isn't something kaa.base.ipc can do automatically.  It needs some
> supporting logic.
> 
> So in reality, we'll need a client API that uses IPC and does
> intelligent things for the API its wrapping.  This isn't really a
> problem, it just means that kaa ipc isn't magic pixie dust like I
> claimed it was. :)
> 
> 
>>And kaa.epg has different sources. One is
>>xmltv, a new one could be sync-from-other-db. 
> 
> 
> As I mentioned on IRC, unless this is just a straight copy of the sqlite
> file, this probably isn't worth it.  Syncing individual rows means
> accessing the db through pysqlite in which case we're not really saving
> anything.  With libxml2, parsing the xml file is very quick.  Almost all
> the time is due to db accesses, so we're not saving much by syncing at
> the row level from another db.
> 
> Copying the epgdb.sqlite file straight over would be a big win, of
> course.  We could implement that eventually.

I don't like the idea of syncing the databases.  I'd rather that there
was one database, one controlling process with a client/server interface
(for reads AND writes (including large writes like tv_grab results)).

I would be happy with something like this:

              +---------------------+
              |epg_server.py        +
              +---------------------+
              |   kaa.epg   +-------+
              |             | DB    |
              |-------------+-------+
              |object interface/IPC |
              +---------------------+
                      /\
                     /  \
                    /    \
            UNIX or TCP/IP sockets
                  /        \
                 /          \
     +--------------+    +--------------+
     |epg_client.py |    |epg_client.py |
     +--------------+    +--------------+
     |  client 1    |    |  client n    |
     +--------------+    +--------------+
     | cached OR    |
     | realtime view|
     +--------------+

Where epg_server.py would be the server process and singular interface
to the real EPG / DB.  epg_client.py would be the required wrapper /
glue that is imported by the client process (tv/webserver/tv_grab) and
accessed like a local object in that respect, and juggles the server
interface.  Ideally the only cache that should exist at all would be on
the client, and NOT in the server (db results in Program/Channels array
for example) because then there'd be multiple layers of cache, which we
would have today if we did this with the current kaa.epg.

- -Rob


- --
- -------------------------------------------------------
Rob Shortt        | http://tvcentric.com | Freevo
[EMAIL PROTECTED] | http://freevo.sf.net | Free your TV
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDTAwWJ+LOBHZ1wCsRAmhYAJ9vvHeK8Y9n2a7/JSptm1GVVCyZ3ACfbOQp
9C9y+w8fKi+fJwM5Pwp+o2k=
=VJAk
-----END PGP SIGNATURE-----


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Freevo-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freevo-devel

Re: [Freevo-devel] Re: kaa.epg sucks

Reply via email to