Beware that the native thrift php bindings has a bug which might change provided argument types. Check out the bug report which I filled: https://issues.apache.org/jira/browse/THRIFT-796
- Garo On Fri, Aug 20, 2010 at 10:35 AM, sasha <sasha2...@gmail.com> wrote: > Julian Simon <jsimon <at> jules.com.au> writes: > >> >> Hi, >> >> I've been trying to benchmark Cassandra for our use case and have been >> seeing poor performance on both writes and (extremely) poor >> performance on reads. >> >> Using Cassandra 0.51 stable & thrift-0.2.0. >> >> It turns out all the CPU time is going to the PHP client process - the >> JVM operating the Cassandra server isn't breaking much of a sweat. >> >> For reads the latency is often up to 1 second to fetch a row >> containing ~2000 columns, or around 300ms to fetch a 500-column wide >> row. This is with get_slice(), and a predicate specifying the start & >> finish range. >> >> Using cachegrind and inspecting the code inside the Thrift bindings >> makes it pretty clear why the performance is so bad, particularly on >> reads. The biggest culprit is the translation code which casts data >> back and forth into binary representations for sending over the wire >> to the Cassandra server. >> >> There seems to be some 32-bit specific code which iterates heavily >> apparently due to a limitation in PHPs implementation of LONGs. >> >> However, testing on a 64-bit host doesn't yield any performance improvement. >> >> More surprisingly, if I compile and enable the PHP native thrift >> bindings (following this guide >> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP) >> read performance actually degrades by another 50%. I have verified >> that the Thrift code is recognizing and using the native PHP functions >> provided by the library. >> >> I've tested all of this on both 32-bit and 64-bit installations of >> both PHP 5.1 & 5.2. Results are the same in all cases. >> >> My environment is on vanilla CentOS 5.4 server installations inside >> VMWare on a 4 core 64bit host with plenty of RAM and fast disks. >> >> Has anyone been able to produce decent performance with PHP & >> Cassandra? If so, how have you done it? >> >> Thanks, >> Jules >> >> > > > I had exactly the same problem: without native thrift bindings the performance > was low and PHP used too much CPU. But when I compiled > and enabled the native thrift bindings (following this guide https:// > wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP), the performance > became even lower, it degraded SEVERAL TIMES (although CPU usage decreased > too). > > With the several random tries I discovered, that the buffer size matters. I > mean the second and the third arguments for "new TBufferedTransport($socket, > X, > Y)". But the most surprising fact is that it matters much more when using > native thrift bindings than when not using them. > > I.e.: > - get_range_slices without native thrift bindings (either small or large > buffer > size): ~1sec. > - get_range_slices with native thrift bindings and small buffer size (1024): > ~5sec! > - get_range_slices with native thrift bindings and large buffer size (40960): > ~0.1sec. > > I don't know why!! > > P.S.: cassandra 0.6.3. > > >