bert Passek created CASSANDRA-4229: -------------------------------------- Summary: Infinite MapReduce Task while reading via ColumnFamilyInputFormat Key: CASSANDRA-4229 URL: https://issues.apache.org/jira/browse/CASSANDRA-4229 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Environment: Debian Squeeze Reporter: bert Passek Attachments: screenshot.jpg
Hi, we recently upgraded cassandra from version 1.0.9 to 1.1.0. After that we can not execute any hadoop jobs which reads data from cassandra via ColumnFamilyInputFormat. A map task is created which is running infinitely. We are trying to read from a super column family with more or less 1000 row keys. This is the output from job interface where we already have 17 million map input records !!! Map input records 17.273.127 0 17.273.127 Reduce shuffle bytes 0 391 391 Spilled Records 3.288 0 3.288 Map output bytes 639.849.351 0 639.849.351 CPU time spent (ms) 792.750 7.600 800.350 Total committed heap usage (bytes) 354.680.832 48.955.392 403.636.224 Combine input records 17.039.783 0 17.039.783 SPLIT_RAW_BYTES 212 0 212 Reduce input records 0 0 0 Reduce input groups 0 0 0 Combine output records 3.288 0 3.288 Physical memory (bytes) snapshot 510.275.584 96.370.688 606.646.272 Reduce output records 0 0 0 Virtual memory (bytes) snapshot 1.826.496.512 934.473.728 2.760.970.240 Map output records 17.273.126 0 17.273.126 We must kill the job and we have to go back to version 1.0.9 because 1.1.0 is not usable for reading from cassandra. Best regards Bert Passek -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira