What is the default?
On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <jak...@gmail.com> wrote: > You need to call > $socket->setRecvTimeout() > With a higher number in ms. > > > On Oct 15, 2009, at 11:26 AM, Eric Lubow <eric.lu...@gmail.com> wrote: > > Using the Thrift Perl API into Cassandra, I am running into what is > endearingly referred to as the 4 bytes of doom: > TSocket: timed out reading 4 bytes from localhost:9160 > The script I am using is fairly simple. I have a text file that has about > 3.6 million lines that are formatted like: ...@bar.com 1234 > The Cassandra dataset is a single column family called Users in the Mailings > keyspace with a data layout of: > Users = { > 'f...@example.com': { > email: 'f...@example.com', > person_id: '123456', > send_dates_2009-09-30: '2245', > send_dates_2009-10-01: '2247', > }, > } > There are about 3.5 million rows in the Users column family and each row has > no more than 4 columns (listed above). Some only have 3 (one of the > send_dates_YYYY-MM-DD isn't there). > The script parses it and then connects to Cassandra and does a get_slice and > counts the return values adding that to a hash: > my ($value) = $client->get_slice( > 'Mailings', > $email, > Cassandra::ColumnParent->new({ > column_family => 'Users', > }), > Cassandra::SlicePredicate->new({ > slice_range => Cassandra::SliceRange->new({ > start => 'send_dates_2009-09-29', > finish => 'send_dates_2009-10-30', > }), > }), > Cassandra::ConsistencyLevel::ONE > ); > $counter{($#{$value} + 1)}++; > For the most part, this script times out after 1 minute or so. Replacing the > get_slice with a get_count, I can get it to about 2 million queries before I > get the timeout. Replacing the get_slice with a get, I make it to about 2.5 > million before I get the timeout. The only way I could get it to run all > the way through was to add a 1/100 of a second sleep during every iteration. > I was able to get the script to complete when I shut down everything else > on the machine (and it took 177m to complete). But since this is a > semi-production machine, I had to turn everything back on afterwards. > So for poops and laughs (at the recommendation of jbellis), I rewrote the > script in Python and it has since run (using get_slice) 3 times fully > without timing out (approximately 130m in Python) with everything else > running on the machine. > My question is, having seen this same thing in the PHP API and it is my > understanding that the Perl API was based on the PHP API, > could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here > too? Is anyone else seeing this issue? If so, have you gotten around it? > Thanks. > -e