[
https://issues.apache.org/jira/browse/CASSANDRA-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis resolved CASSANDRA-4229.
---------------------------------------
Resolution: Not A Problem
bq. If you are using the row-key, you must duplicate the bytebuffer, otherwise
the RowIterator in ColumnFamilyRecordReader does not finish correctly.
Yes, it's basic hygiene not to modify the keys out from under the mapper, or
more generally, "Don't mutate method parameters since the caller may still be
using them."
We don't duplicate everything defensively just in case you do something you
shouldn't, since the performance hit is too high. So if you need to mutate you
need to perform the copy yourself. Better: use the positional get methods of
BB instead of the mutating ones.
> Infinite MapReduce Task while reading via ColumnFamilyInputFormat
> -----------------------------------------------------------------
>
> Key: CASSANDRA-4229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4229
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 1.1.0
> Environment: Debian Squeeze
> Reporter: bert Passek
> Attachments: screenshot.jpg
>
>
> Hi,
> we recently upgraded cassandra from version 1.0.9 to 1.1.0. After that we can
> not execute any hadoop jobs which reads data from cassandra via
> ColumnFamilyInputFormat.
> A map task is created which is running infinitely. We are trying to read from
> a super column family with more or less 1000 row keys.
> This is the output from job interface where we already have 17 million map
> input records !!!
> Map input records 17.273.127 0 17.273.127
> Reduce shuffle bytes 0 391 391
> Spilled Records 3.288 0 3.288
> Map output bytes 639.849.351 0 639.849.351
> CPU time spent (ms) 792.750 7.600 800.350
> Total committed heap usage (bytes) 354.680.832 48.955.392
> 403.636.224
> Combine input records 17.039.783 0 17.039.783
> SPLIT_RAW_BYTES 212 0 212
> Reduce input records 0 0 0
> Reduce input groups 0 0 0
> Combine output records 3.288 0 3.288
> Physical memory (bytes) snapshot 510.275.584 96.370.688
> 606.646.272
> Reduce output records 0 0 0
> Virtual memory (bytes) snapshot 1.826.496.512 934.473.728
> 2.760.970.240
> Map output records 17.273.126 0 17.273.126
> We must kill the job and we have to go back to version 1.0.9 because 1.1.0 is
> not usable for reading from cassandra.
> Best regards
> Bert Passek
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira