[mythtv-users] Commercial Flagging Idea - Distributed Commercial Flagging (long)

Christopher David Petersen Thu, 14 Apr 2005 12:33:48 -0700

I'm starting this thread so I don't pollute the discussion of fingerprinting for commercial flagging (which is a brilliant, but separate idea).

This thread is meant to discuss that idea of distributed commercial flagging (DCF) via existing algorithms to reduce load and increase accuracy.

In brief, here's the basic idea:

1) Collect commercial flagging information from participating users at a central server (hopefully, this isn't a DMCA violation).

2) Analyze the data to determine groups of users who have performed duplicate work.

3) Analyze the data to predict groups of users who will be performing duplicate work.

4) Distribute the future duplicate work among the users to reduce each users individual load.

Here's an example using _Lost_ and _Alias_ (chosen for their short names):

Givens:

- 23 users with Comcast Analog Basic Cable service in Portland, OR record and flag new episodes of _Lost_ and _Alias_ each week.

- These users use a variety of commercial flagging methods.

- The machines have a variety of available CPU power.

Scenario:

- 23 users submit data to the DCF server via a secured and anonymous interface. This data includes, which shows they flagged and the start and end times of each commercial segment. All times are synchronized, to the DCF server's highly-accurate clock (more on this later).

- After submitting each show's data, the DCF server indicates to the client whether the client can join a "partnership".

- "Partnerships" are created when the DCF server determines that 2 or more users are performing (and will perform) duplicate work (with similar output) for 1 or more shows.

- 10 users are invited to join the new partnership for new episodes of _Lost_ and _Alias_ on Comcast Analog Basic Cable, Portland, OR. These ten users are invited because they're machines are of similar power (i.e. commercial flagging occurs after a similar delay and in a similar amount of time). These users are now "Partners" within the "Partnership".

- At first, none of the partners are "trusted" or have earned any "credits" within the partnership. As partners submit more data they earn more credits. The exact amount they earn per submission is weighted by how much they are trusted (their "fidelity" factor") and the accuracy of the submitted data (how similar it is to other data). Once partners have earned enough credits, they can "purchase" data from the partnership.

- After N weeks, only 7 users have earned enough credits to share data.

- 3 partners are selected to flag next week's episode of _Lost_.

- 4 partners are selected to flag next week's episode of _Alias_.

- Of the 3 partners selected to flag _Lost_, all do so and submit their data.

- Of the 4 partners selected to flag _Alias_, only 3 do so and submit their data. The 1 user who did not submit data has lessened his "fidelity" factor.

- The 6 partners who submitted data, earn credits and increase their fidelity factor.

- The 3 partners who *do not* have _Lost_ flag data spend credits to receive this data (at a discounted cost, because of their increased fidelity).

- The 3 partners who *do not* have _Alias_ flag data spend credits to receive this data (again, at the discount cost).

- The flag data is not perfect: clocks, settings, reception, etc. vary. The partners use the "purchased" flag data to limit their own commercial flagging to those suspect times within the shows (with perhaps a 1 minute margin before and after). The results of these "verification" flag jobs are submitted back to the server.

Summary:

So, now 7 users have formed a partnership to share the load of flagging _Alias_ and _Lost_.

6 of them are significantly reduced their flagging load for these two shows.

1 partner needs to regain the trust of the partnership by submitting data in a timely manner.

One can easily imagine a greatly expanded model, where a particular user could belong to dozens of partnerships. Each partnership could have hundreds of users, and dozens of shows. As a result of participating in partnerships, the user may one be required to flag a few shows (in their entirety) each week.

Benefits:

- Reduced commercial flagging for individual partners.

- Increased accuracy of commercial flagging (via consensus).

- "Leaching" is not allowed.

- Negative effects of poisoning are reduced through "fidelity" factors and credits.

- New methods of commercial flagging (either local or distributed) can be seamlessly incorporated.

- The available CPU power could be used for new extremely processor intensive flagging methods.

Drawbacks:

- Requires central server.

- Requires many participants.

- Requires frequent communications with the server (albeit, not much data s transferred).

- Requires changing commercial flagging to acquire partnership data.

- Requires changing commercial flagging to allow for flagging just parts of the show.

- Requires interface changes to alert users when they are about to "fail in their partnership duties" by not recording and flagging a show.

- The central DCF server stores recording habits of users. It's anonymous, but still concerning.

- Requires similar "content streams". Anecdotal experience (hearing the same commercials over the phone with friends) make me suspect that commercial *times* don't vary within the same Service Provider. Analysis of submitted data will be the acid test. If they server never finds suitable partnerships, then everybody's content streams must be different, and the whole project is a failure.

- If the project is successful, content providers will further vary the content streams.

Progress:

- I have built a local database to store the DCF data.

- I am building a sql script to populate the DCF database from mythconverg.

- I will be collecting data (via emailed output of the sql script) from other users.

- I have outlined a solution for time synchronization. Basically, partners submit the machines local time with every transaction.

- I am defining a secure and anonymous interface for the DCF server.

- I am defining factors which I believe should effect the "fidelity" of data submitted.

Ideas, questions, comments, criticisms are welcome.

Christopher David Petersen

Member of PoORMUG http://poormug.bitbucket.com/

_______________________________________________
mythtv-users mailing list
[email protected]
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users

[mythtv-users] Commercial Flagging Idea - Distributed Commercial Flagging (long)

Reply via email to