I will be glad to help out with this very daring pilot program.
On 4/14/05, Christopher David Petersen <[EMAIL PROTECTED]> wrote: > I'm starting this thread so I don't pollute the discussion of fingerprinting > for commercial flagging (which is a brilliant, but separate idea). > > This thread is meant to discuss that idea of distributed commercial flagging > (DCF) via existing algorithms to reduce load and increase accuracy. > > In brief, here's the basic idea: > > 1) Collect commercial flagging information from participating users at a > central server (hopefully, this isn't a DMCA violation). > 2) Analyze the data to determine groups of users who have performed > duplicate work. > 3) Analyze the data to predict groups of users who will be performing > duplicate work. > 4) Distribute the future duplicate work among the users to reduce each users > individual load. > > > Here's an example using _Lost_ and _Alias_ (chosen for their short names): > > Givens: > - 23 users with Comcast Analog Basic Cable service in Portland, OR record > and flag new episodes of _Lost_ and _Alias_ each week. > - These users use a variety of commercial flagging methods. > - The machines have a variety of available CPU power. > > Scenario: > - 23 users submit data to the DCF server via a secured and anonymous > interface. This data includes, which shows they flagged and the start and > end times of each commercial segment. All times are synchronized, to the DCF > server's highly-accurate clock (more on this later). > > - After submitting each show's data, the DCF server indicates to the client > whether the client can join a "partnership". > > - "Partnerships" are created when the DCF server determines that 2 or more > users are performing (and will perform) duplicate work (with similar output) > for 1 or more shows. > > - 10 users are invited to join the new partnership for new episodes of > _Lost_ and _Alias_ on Comcast Analog Basic Cable, Portland, OR. These ten > users are invited because they're machines are of similar power (i.e. > commercial flagging occurs after a similar delay and in a similar amount of > time). These users are now "Partners" within the "Partnership". > > - At first, none of the partners are "trusted" or have earned any "credits" > within the partnership. As partners submit more data they earn more credits. > The exact amount they earn per submission is weighted by how much they are > trusted (their "fidelity" factor") and the accuracy of the submitted data > (how similar it is to other data). Once partners have earned enough credits, > they can "purchase" data from the partnership. > > - After N weeks, only 7 users have earned enough credits to share data. > > - 3 partners are selected to flag next week's episode of _Lost_. > > - 4 partners are selected to flag next week's episode of _Alias_. > > - Of the 3 partners selected to flag _Lost_, all do so and submit their > data. > > - Of the 4 partners selected to flag _Alias_, only 3 do so and submit their > data. The 1 user who did not submit data has lessened his "fidelity" factor. > > - The 6 partners who submitted data, earn credits and increase their > fidelity factor. > > - The 3 partners who *do not* have _Lost_ flag data spend credits to receive > this data (at a discounted cost, because of their increased fidelity). > > - The 3 partners who *do not* have _Alias_ flag data spend credits to > receive this data (again, at the discount cost). > > - The flag data is not perfect: clocks, settings, reception, etc. vary. The > partners use the "purchased" flag data to limit their own commercial > flagging to those suspect times within the shows (with perhaps a 1 minute > margin before and after). The results of these "verification" flag jobs are > submitted back to the server. > > Summary: > So, now 7 users have formed a partnership to share the load of flagging > _Alias_ and _Lost_. > 6 of them are significantly reduced their flagging load for these two shows. > 1 partner needs to regain the trust of the partnership by submitting data in > a timely manner. > > One can easily imagine a greatly expanded model, where a particular user > could belong to dozens of partnerships. Each partnership could have hundreds > of users, and dozens of shows. As a result of participating in partnerships, > the user may one be required to flag a few shows (in their entirety) each > week. > > Benefits: > - Reduced commercial flagging for individual partners. > - Increased accuracy of commercial flagging (via consensus). > - "Leaching" is not allowed. > - Negative effects of poisoning are reduced through "fidelity" factors and > credits. > - New methods of commercial flagging (either local or distributed) can be > seamlessly incorporated. > - The available CPU power could be used for new extremely processor > intensive flagging methods. > > Drawbacks: > - Requires central server. > - Requires many participants. > - Requires frequent communications with the server (albeit, not much data s > transferred). > - Requires changing commercial flagging to acquire partnership data. > - Requires changing commercial flagging to allow for flagging just parts of > the show. > - Requires interface changes to alert users when they are about to "fail in > their partnership duties" by not recording and flagging a show. > - The central DCF server stores recording habits of users. It's anonymous, > but still concerning. > - Requires similar "content streams". Anecdotal experience (hearing the same > commercials over the phone with friends) make me suspect that commercial > *times* don't vary within the same Service Provider. Analysis of submitted > data will be the acid test. If they server never finds suitable > partnerships, then everybody's content streams must be different, and the > whole project is a failure. > > - If the project is successful, content providers will further vary the > content streams. > > Progress: > - I have built a local database to store the DCF data. > - I am building a sql script to populate the DCF database from mythconverg. > - I will be collecting data (via emailed output of the sql script) from > other users. > > - I have outlined a solution for time synchronization. Basically, partners > submit the machines local time with every transaction.- I am defining a > secure and anonymous interface for the DCF server. > - I am defining factors which I believe should effect the "fidelity" of data > submitted. > > Ideas, questions, comments, criticisms are welcome. > > > -- > Christopher David Petersen > Member of PoORMUG http://poormug.bitbucket.com/ > > > > > > > > _______________________________________________ > mythtv-users mailing list > [email protected] > http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users > > > _______________________________________________ mythtv-users mailing list [email protected] http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
