Brian,
>Let's say a member completes action 'abc'. We want to query the
>member_actions table for all members that have also completed action 'abc'.
>We then want to determine what the top 3 other actions are that were
>completed by members who have completed action 'abc', while making sure that
>only actions that have not already been completed by this member are
>considered.
Ordered groupwise quotas. For a subquery-free two-table example see http://www.artfulsoftware.com/queries.php#18.

PB

-----

Brian Erickson wrote:
We are looking for some help with queries that will accomplish a similar
feature to what Amazon does. When you purchase a product, Amazon looks at
all other people who have purchased that product, and then looks at all of
the OTHER products those people have purchased, and uses that data to
suggest related products to you. That's essentially what we are trying to
do.

We have 3 tables: members, actions, and member_actions. The 'members' table
tracks all of our customers, the 'actions' table tracks all of the different
actions each member can complete, and the 'member_actions' table is the weak
entity link that tracks which actions each member have completed.

Let's say a member completes action 'abc'. We want to query the
member_actions table for all members that have also completed action 'abc'.
We then want to determine what the top 3 other actions are that were
completed by members who have completed action 'abc', while making sure that
only actions that have not already been completed by this member are
considered.

We are using MySQL version 3.23. There are approximately 500 unique rows in
the 'actions' table and 2,000,000 rows in the member_actions table, with
3,000+ actions being recorded at any given time.

Is it possible to achieve this functionality with one/few queries? The
statistics above may be important because if a query takes too long to
execute, the server may not be physically capable of executing that query
3,000+ times simultaneously.

Another option we have considered is to create a separate table called
'correlation' with two fields: action and correlated_action. We would then
populate this table in a batch process following the pseudocode below.

SELECT DISTINCT(action) FROM member_actions Loop
    SELECT DISTINCT(member) FROM member_actions WHERE action = x
    Loop
        SELECT DISTINCT(action) FROM member_actions WHERE member = y AND
action <> x
        Loop
            INSERT INTO correlation (action, correlated_action) VALUES (x,
z)

Then we could easily query this table to find correlated actions like so:

SELECT DISTINCT(correlated_action), COUNT(*) AS count FROM correlation WHERE
action = x GROUP BY correlated_action ORDER BY count DESC

This would not solve the issue of only returning actions that the member had
not already completed, but that could probably be accomplished by simply
joining the correlation table back to the member_actions table.

So, our question is whether or not this is feasible with a one/few query
approach, or if this is something that should be accomplished with something
similar to the approach above? Can anyone provide a good start for us?

  

No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.385 / Virus Database: 268.3.2/294 - Release Date: 3/27/2006
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.3.2/294 - Release Date: 3/27/2006

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to