Re: A common request

Gregory Magarshak Thu, 31 Mar 2011 05:32:36 -0700

By the way, sorry ... I wanted to clarify one thing:

I am trying to FILTER by the unique index (fb_uid) in this case, butJOIN on the primary key (id) and I don't need to use any other fields. Iam guessing that the MySQL indexes map indexed fields (fb_uid) to theprimary key (id) so I wouldn't have to touch the disk. Am I right aboutthat?




On 3/31/11 8:29 AM, Gregory Magarshak wrote:

Thanks for your insight! But I'm still worried about the performanceof IN ( big list of values ). Can you tell me how it is implemented?
Suppose I have SELECT a FROM b WHERE c IN (1, 4, 5, 117, 118, 119,..., 387945)
1) If I put 200 values there, does it do 200 individual SELECTsinternally, and union them? Or does it notice that c has a "UNIQUE"index and thus at most one row can be returned per SELECT, and doesthem all at once?
2) If I want to get just the primary key, or join with another tablebased on just the primary key, does this query ever touch the disk(assuming the index is in memory, which I think it always is --correct me if I'm wrong about that).
The way I would recommend doing it (for BTREE indexes, anyway) is tosort the values in ascending order, and do the search in one passthrough the index. The index is already in memory, and it would bestraightforward to modify a binary search algorithm to find the rowscorresponding to monotonically ascending values of the primary key,all in one pass.
Even if the binary search algorithm is run 200 or 2000 times for alist, it would still be faster than hitting the disk. (Even though theCPU cache performance would be worse.)
Can you let me know the specifics of it, and especially how I canavoid hitting the I/O bottlenecks?
Thank you,
Greg

On 3/29/11 4:17 PM, Peter Brawley wrote:
> Why not optimize the IN ( ... ) to do the same type of thing?
If the argument to IN() is a list of values, it'll be OK. If it's aSELECT, in 5.0 it will be slower than molasses (see "The unbearableslowness of IN()" at http://www.artfulsoftware.com/queries.php.
> I always tried to avoid joins because I am planning to horizontallypartition my data.
A severe & unfortunate constraint. Can't help you there.

PB

-----

On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
Yes, this would be fine. But often, the list of friends is obtainedfrom a social network like facebook, and is not stored internally.Basically, I obtain the friend list in a request to facebook, andthen see which of those users have created things. So would I haveto create a temporary table and insert all those uids just to make ajoin? Why not optimize the IN ( ... ) to do the same type of thing?
There is also a second problem: I want to use MySQL Cluster, becauseI expect to have many users. Would it be efficient to use JOINbetween the friends table and the articles table? Both tables arepartitioned by user_id as the primary key, so the join would have tohit many different nodes. I always tried to avoid joins because I amplanning to horizontally partition my data. But if MySQL cluster canhandle this join transparently and split it up based on thepartition, then that's fine. Do you have any info on this?
Greg

On 3/29/11 2:10 PM, Peter Brawley wrote:
> How can I quickly find all the articles written by this user'sfriends, and not just random articles?
Taking the simplest possible case, with tablefriends(userID,friendID) where each friendID refers to a userID inanother row, the friends of userID u are ...
select friendID from user where userID=u;

so articles by those friends of u are ...
select a.* from article a join ( select friendID from user whereuserID=u ) f on a.userID=f.friendID;
PB

-----

On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
Hey there. My company writes a lot of social applications, andthere is one operation that is very common, but I don't know ifMySQL supports it in a good way. I thought I'd write to this listfor two reasons:
1) Maybe MySQL has a good way to do this, and I just don'tknow about it
2) Propose to MySQL developers a simple algorithm which wouldgreatly improve MySQL support for social networking apps.
Here is the situation. Let's say I have built a socialnetworking application where people create and edit some item(article, photo, music mix, whatever). Now, a typical user logsin, and this user has 3000 friends. How can I quickly find all thearticles written by this user's friends, and not just randomarticles?
    Ideally, I would want to write something like this:
SELECT * FROM article WHERE user_id IN (345789, 324875, 398,..., 349580)
basically, execute a query with a huge IN ( ... ). Maybe ifthis would exceed the buffer size for the MySQL wire protocol, Iwould break up the list into several lists, and execute severalqueries, and union the results together myself.
But my point is, this is very common for social networkingapps. Every app wants to show "the X created by your friends", or"friends of yours (given some list from a social network) who havetaken action X".
Here is how I would do it if I had raw access to the MySQLindex in memory:
    a) Sort the list of entries in the IN, in ascending order.
b) Do *ONE* binary search through the index (assuming it's aBTREE index) and get them all in one pass. If it's a HASH index orsomething, I would have to look up each one individually.
The benefits of this approach would be that this commonoperation would be done extremely quickly. If the index fitsentirely in memory, and I just want to get the primary keys (i.e.get the list of friends who did X), the disk isn't even touched.In addition, for BTREE indexes, I would just need ONE binarysearch, because the entries have been sorted in ascending order.
Does MySQL have something like this? And if not, perhaps youcan add it in the next version? It would really boost MySQL'ssupport for social networking apps tremendously. Alternative, howcan I add this to my MySQL? Any advice would be appreciated.
Sincerely,
Gregory Magarshak
Qbix



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[email protected]

Re: A common request

Reply via email to