My connection section of the script is here: # Connect to the database
my $socket = new Thrift::Socket('localhost',9160);
$socket->setSendTimeout(2500);
$socket->setRecvTimeout(7500);
my $transport = new Thrift::BufferedTransport($socket,2048,2048);
my $protocol = new Thrift::BinaryProtocol($transport);
my $client = Cassandra::CassandraClient->new($protocol);
I even tried it with combinations of 1024 as the size and 1000 as the
SendTimeout and 5000 as the RecvTimeout.
-e
On Thu, Oct 15, 2009 at 11:42 AM, Jake Luciani <[email protected]> wrote:
> I think it's 100ms. I need to increase it to match python I guess.
>
> Sent from my iPhone
>
>
> On Oct 15, 2009, at 11:40 AM, Jonathan Ellis <[email protected]> wrote:
>
> What is the default?
>>
>> On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <[email protected]> wrote:
>>
>>> You need to call
>>> $socket->setRecvTimeout()
>>> With a higher number in ms.
>>>
>>>
>>> On Oct 15, 2009, at 11:26 AM, Eric Lubow <[email protected]> wrote:
>>>
>>> Using the Thrift Perl API into Cassandra, I am running into what is
>>> endearingly referred to as the 4 bytes of doom:
>>> TSocket: timed out reading 4 bytes from localhost:9160
>>> The script I am using is fairly simple. I have a text file that has
>>> about
>>> 3.6 million lines that are formatted like: [email protected] 1234
>>> The Cassandra dataset is a single column family called Users in the
>>> Mailings
>>> keyspace with a data layout of:
>>> Users = {
>>> '[email protected]': {
>>> email: '[email protected]',
>>> person_id: '123456',
>>> send_dates_2009-09-30: '2245',
>>> send_dates_2009-10-01: '2247',
>>> },
>>> }
>>> There are about 3.5 million rows in the Users column family and each row
>>> has
>>> no more than 4 columns (listed above). Some only have 3 (one of the
>>> send_dates_YYYY-MM-DD isn't there).
>>> The script parses it and then connects to Cassandra and does a get_slice
>>> and
>>> counts the return values adding that to a hash:
>>> my ($value) = $client->get_slice(
>>> 'Mailings',
>>> $email,
>>> Cassandra::ColumnParent->new({
>>> column_family => 'Users',
>>> }),
>>> Cassandra::SlicePredicate->new({
>>> slice_range => Cassandra::SliceRange->new({
>>> start => 'send_dates_2009-09-29',
>>> finish => 'send_dates_2009-10-30',
>>> }),
>>> }),
>>> Cassandra::ConsistencyLevel::ONE
>>> );
>>> $counter{($#{$value} + 1)}++;
>>> For the most part, this script times out after 1 minute or so. Replacing
>>> the
>>> get_slice with a get_count, I can get it to about 2 million queries
>>> before I
>>> get the timeout. Replacing the get_slice with a get, I make it to about
>>> 2.5
>>> million before I get the timeout. The only way I could get it to run all
>>> the way through was to add a 1/100 of a second sleep during every
>>> iteration.
>>> I was able to get the script to complete when I shut down everything
>>> else
>>> on the machine (and it took 177m to complete). But since this is a
>>> semi-production machine, I had to turn everything back on afterwards.
>>> So for poops and laughs (at the recommendation of jbellis), I rewrote the
>>> script in Python and it has since run (using get_slice) 3 times fully
>>> without timing out (approximately 130m in Python) with everything else
>>> running on the machine.
>>> My question is, having seen this same thing in the PHP API and it is my
>>> understanding that the Perl API was based on the PHP API,
>>> could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here
>>> too? Is anyone else seeing this issue? If so, have you gotten around
>>> it?
>>> Thanks.
>>> -e
>>>
>>