Re: Some changes in memcached to efficiently process multi-get queries

Rohit Karlupia Tue, 23 Jul 2013 15:35:42 -0700

Take a look at cacheismo. It supports memcached protocol and provides fully
scriptable server side runtime.


thanks,
rohitk
On Jul 22, 2013 3:19 PM, "Sunil Patil" <[email protected]> wrote:

> Hi,
>
> All changes "memcached code with support for doing data filtering on
> server for multi-get queries" (somewhat similar to executing lua script on
> redis server but much more efficient) is now available at
> https://github.com/sunillp/**sheep-memcached<https://github.com/sunillp/sheep-memcached>
>
> In addition, we have provided a sample filter library whose filtering
> functions are called in order to process/filter multi-get queries on server.
> Have provided a "memcached client" which measures performance (throughput
> and latency) for multi-get queries. This client can be used to see the
> enhancements that can be achieved by doing data filtering on server.
> Details of usage/experiments given in README file under section
> "BUILDING/TESTING" available at https://github.com/sunillp/**
> sheep-memcached <https://github.com/sunillp/sheep-memcached>
>
> We plan to support many more features using this framework of filter
> library, basically operations that can be performed on server itself
> without the need for reading data upto client and processing data on
> client, ex: pre-processing data before writing into memcached server on SET
> (this is like a read-modify-update operation. Here data is read from server
> to client, updated/modified on client and then return back and stored on
> server. If we provide a mechanism for updating data in-place/on-server then
> this operation would become fast and there wont be any network
> traffic/load).
>
> Let us know your feedback.
>
> Thanks,
> Sunil
>
> On Saturday, 18 May 2013 15:24:36 UTC+5:30, Sunil Patil wrote:
>>
>> Hi,
>>
>> We have made some changes in memcached for doing "Data filtering at
>> server". We would like to open source this and contribute to memcached. We
>> can provide you the patch for review. We have developed some tests (which
>> people could try out) that show benefits of this i.e. "data filtering at
>> server".
>>
>> Please let me know your thoughts.
>>
>> Thanks,
>> Sunil
>>
>> About Changes:
>> - With these changes we can do "data filtering at server". This is good
>> for multi-get queries ex: queries issued in social networking applications
>> where "data related to all friends of a user is read, processed, and
>> filtered data is returned to user. Filtered data is often a very small
>> subset of actual data that was read".
>> - On a side note not related to memcached server, we also plan to
>> implement data colocation on memcached client (all friends data will be
>> stored on single (or very few) server), so that very few servers are
>> contacted during query processing. This would further compliment data
>> filtering.
>>
>> Changes:
>> 1. Added two new options to memcached server (-x and –y):
>> # ./memcached –h
>> …
>> -x <num> -y <filter library path>
>>               Enable data filtering at server - helps in multi-get
>> operations
>>               <num> = 1 - Data filtering at server enable (no
>> deserialized data)
>>                           Data deserialized at the time of query
>> processing
>>               <num> = 2 - Data filtering at server enable (with
>> deserialized data)
>>                           Uses more memory but gives better performance.
>> Avoids data
>>                           deserialization at the time of query processing
>> and
>>                           saves CPU cycles
>>               <filter library path> - path of filter library
>> 'libfilter.so'
>>                           This library implements filtering functions and
>> data
>>                           serialization/deserialization functions
>>
>> 2. On enabling filtering, on "get" query we read data of all keys and
>> pass this data to a filtering function implemented in user provided library
>> "libfilter.so". "dlopen", "dlsym" framework is used for opening user
>> provided library and calling user provided functions. User has to define
>> only three functions, "deserialize()", "free_msg()" and "readfilter()". We
>> plan to introduce a new command "fget" (filter get) for this functionality
>> wherein client could additionally pass arguments to filter function and
>> could have multiple filtering functions (i.e. can have (work with) multiple
>> filter libraries).
>>
>> Currently changes are implemented for linux platform (tested on linux
>> version RHEL 5.6). Changes made on memcached version "memcached-1.4.13".
>> Changes made for ascii protocol (not for binary protocol), no impact on
>> "gets" (get with CAS) functionality.
>>
>> Performance enhancement:
>> Some of the advantages of this are (for multi-get queries with
>> characteristics mentioned above):
>> - Better throughput and latency under normal query-load conditions => can
>> result in client consolidation
>> - Since most data is filtered at server, very less data traffic flows
>> over network (from server to client). This avoids network congestion (and
>> hence latencies/delays caused by this) which might happen under high
>> query-load with normal memcached.
>>
>> - Performance with these changes (for multi-get queries with
>> characteristics mentioned above) is 3x to 7x times better than normal
>> memcached as shown below.
>>
>> Tests performed:
>> - Setup details:
>> 1 memcached server, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb ethernet
>> card
>> 1 memcached client, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb ethernet
>> card
>> - Test details:
>> There are one million users (each user represented by a unique key). Each
>> user has 100 friends. Each user has 30 records of type (userId, articleId,
>> timestamp) stored as value. On READ query for a user, all records
>> associated with all friends of that user are READ, sorted in increasing
>> order of timestamp, and top/latest 10 records across all friends are
>> returned as output. So basically on READ query 100 keys (100*30=3000
>> records) are read, 3000 records are sorted and top 10 records are returned
>> as output.
>>
>> - For normal memcached all these operations of READING 100 keys, sorting
>> 3000 records, and finding top 10 records are done on client.
>> - With our changes (where filtering (sorting) happens on server), on
>> server 100 keys are read, 3000 records are sorted locally by filtering
>> function (implemented in user provided library – similar processing is done
>> on server as it is done on client), and only 10 records are sent to the
>> client.
>>
>> Created a multithreaded CLIENT application which issues READ queries
>> asynchronously (multiple threads are used for issuing and processing READ
>> queries). READ queries are issued for varying number of users starting from
>> 1 user to 30000 users. Time taken to complete these queries is used to
>> compute throughput and latency. See the attachments for perf. results.
>>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Some changes in memcached to efficiently process multi-get queries

Reply via email to