Re: Thrift Perl API Timeout Issues

Jake Luciani Thu, 15 Oct 2009 09:20:13 -0700

What happens if you set it to 100000?



On Oct 15, 2009, at 11:48 AM, Eric Lubow <[email protected]> wrote:

My connection section of the script is here:
 # Connect to the database
 my $socket = new Thrift::Socket('localhost',9160);
    $socket->setSendTimeout(2500);
    $socket->setRecvTimeout(7500);
 my $transport = new Thrift::BufferedTransport($socket,2048,2048);
 my $protocol = new Thrift::BinaryProtocol($transport);
 my $client = Cassandra::CassandraClient->new($protocol);
I even tried it with combinations of 1024 as the size and 1000 asthe SendTimeout and 5000 as the RecvTimeout.
-e
On Thu, Oct 15, 2009 at 11:42 AM, Jake Luciani <[email protected]>wrote:
I think it's 100ms. I need to increase it to match python I guess.

Sent from my iPhone
On Oct 15, 2009, at 11:40 AM, Jonathan Ellis <[email protected]>wrote:
What is the default?
On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <[email protected]>wrote:
You need to call
$socket->setRecvTimeout()
With a higher number in ms.


On Oct 15, 2009, at 11:26 AM, Eric Lubow <[email protected]> wrote:

Using the Thrift Perl API into Cassandra, I am running into what is
endearingly referred to as the 4 bytes of doom:
 TSocket: timed out reading 4 bytes from localhost:9160
The script I am using is fairly simple. I have a text file that hasabout
3.6 million lines that are formatted like:  [email protected]  1234
The Cassandra dataset is a single column family called Users in theMailings
keyspace with a data layout of:
Users = {
   '[email protected]': {
       email: '[email protected]',
       person_id: '123456',
       send_dates_2009-09-30: '2245',
       send_dates_2009-10-01: '2247',
   },
}
There are about 3.5 million rows in the Users column family and eachrow has
no more than 4 columns (listed above).  Some only have 3 (one of the
send_dates_YYYY-MM-DD isn't there).
The script parses it and then connects to Cassandra and does aget_slice and
counts the return values adding that to a hash:
    my ($value) = $client->get_slice(
        'Mailings',
        $email,
        Cassandra::ColumnParent->new({
                column_family => 'Users',
            }),
        Cassandra::SlicePredicate->new({
                slice_range => Cassandra::SliceRange->new({
                        start => 'send_dates_2009-09-29',
                        finish => 'send_dates_2009-10-30',
                    }),
            }),
        Cassandra::ConsistencyLevel::ONE
    );
    $counter{($#{$value} + 1)}++;
For the most part, this script times out after 1 minute or so.Replacing theget_slice with a get_count, I can get it to about 2 million queriesbefore Iget the timeout. Replacing the get_slice with a get, I make it toabout 2.5million before I get the timeout. The only way I could get it torun allthe way through was to add a 1/100 of a second sleep during everyiteration.I was able to get the script to complete when I shut downeverything else
on the machine (and it took 177m to complete).  But since this is a
semi-production machine, I had to turn everything back on afterwards.
So for poops and laughs (at the recommendation of jbellis), Irewrote the
script in Python and it has since run (using get_slice) 3 times fully
without timing out (approximately 130m in Python) with everything else
running on the machine.
My question is, having seen this same thing in the PHP API and it ismy
understanding that the Perl API was based on the PHP API,
could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perlheretoo? Is anyone else seeing this issue? If so, have you gottenaround it?
Thanks.
-e

Re: Thrift Perl API Timeout Issues

Reply via email to