http://bugs.gpodder.org/show_bug.cgi?id=259
Thomas Perl <[EMAIL PROTECTED]> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |[EMAIL PROTECTED]
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
--- Comment #3 from Thomas Perl <[EMAIL PROTECTED]> 2008-11-25 15:02:26 ---
(In reply to comment #2)
> Thomas: If I understand you correctly, you're saying that gpodder uses the url
> attribute of the enclosure tag to determine uniqueness/already-seen-ness of an
> episode listed in a feed.
>
> Caveats: I do not know python, nor have I looked at the gpodder code until
> just
> now.
>
> It looks to me that in databse.py, the function load_episode (line 325) is the
> code that is responsive for determining if an episode has already been seen
> and
> downloaded.
>
> If you change the constraint from 'where url = ?' to 'where guid = ?' just to
> make the feed I gave work because then in the scenario that you gave - guid
> staing teh same but url changes each time - that would only suffer the same
> problem that I am encountering.
>
> But what if you used 'where (url = ?) && (guid = ?)' as the constraint?
>
> That would not break the hypothetical feed/channel you describe where the guid
> is always the same. It would reduce to 'where (url = ?) && TRUE' because the
> guid parameter is alway going to be the same for each item entry in the feed.
> Something && TRUE == Something, as if the && TRUE wasn't there.
>
> The only downside, if you even consider it one, is adding an extra parameter
> to
> load_episode() function and making sure you remember to update it where ever
> it
> is called. (I hate it when I forget that! :-) )
>
> Of course, as per the caveat, I could be completely wrong about section of
> code
> actually responsible for doing this checking, but even in that case I believe
> I
> would still be right about the basic idea of simply adding a constraint to the
> WHERE clause in the appropriate SQL statement that would not break anything
> else.
Thanks for your suggestions. They are a good start to approach this problem.
The problem is that in case the GUID is really unique (count(*) from ... where
guid=<guid> returns exactly 1), we want to be able to say "this is the same
episode, even if the URL has changed.
So, what kind of combinations do we have here?
(C=changing, N=not changing)
URL | GUID
C C -> both have changed - this is a new episode
N C -> url stays the same, new guid - this is a new episode
N N -> this is an old episode
When the GUID is only available once in the database (i.e. we assume the GUID
is really unique and identifies the episode):
URL | GUID
C N -> url has changed, but guid is the same - old episode
When the GUID has more than one occurences in the database (i.e. the GUID does
NOT identify an episode uniquely):
URL | GUID
C N -> this is a new episode
So, we have to differentiate depending on how often the GUID appears in the
database. Based on this, I think the WHERE clause for selecting the episode
given the url and guid would look like this:
WHERE (url=? AND guid=? AND (SELECT COUNT(*) FROM episodes WHERE guid=?)>1) OR
(guid=? AND (SELECT COUNT(*) FROM episodes WHERE guid=?)=1)
If the result of this query has zero rows, the episode is new.
Justin, Cory: What do you think? Does this sound valid to you?
--
Configure bugmail: http://bugs.gpodder.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
_______________________________________________
gPodder-Bugs mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/gpodder-bugs