[ https://issues.apache.org/jira/browse/CASSANDRA-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeremy Hanna updated CASSANDRA-2905: ------------------------------------ Description: One thing that would improve the built-in ColumnFamilyRecordReader is some retry logic if it times out on hasNext. It could help in addition to setting the rpc_timeout_in_ms, so that timeouts happen less frequently so there are fewer blacklisted task trackers (which are the result of an error, including the timeout). {quote} java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) ... 17 more {quote} was: One thing that would improve the built-in ColumnFamilyRecordReader is some retry logic if it times out on hasNext. It could help in addition to setting the rpc_timeout_in_ms, so that timeouts happen less frequently so there are fewer blacklisted task trackers (which are the result of an error, including the timeout). {code} java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) ... 17 more {code} > Add retry logic to ColumnFamilyRecordReader > ------------------------------------------- > > Key: CASSANDRA-2905 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2905 > Project: Cassandra > Issue Type: Improvement > Reporter: Jeremy Hanna > Assignee: Jeremy Hanna > Labels: hadoop > > One thing that would improve the built-in ColumnFamilyRecordReader is some > retry logic if it times out on hasNext. It could help in addition to setting > the rpc_timeout_in_ms, so that timeouts happen less frequently so there are > fewer blacklisted task trackers (which are the result of an error, including > the timeout). > {quote} > java.lang.RuntimeException: TimedOutException() at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) > at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at > org.apache.hadoop.mapred.Child$4.run(Child.java:268) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: > TimedOutException() at > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) > at > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) > ... 17 more > {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira