[jira] [Resolved] (CASSANDRA-4229) Infinite MapReduce Task while reading via ColumnFamilyInputFormat

Jonathan Ellis (JIRA) Thu, 09 Aug 2012 08:30:21 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Ellis resolved CASSANDRA-4229.
---------------------------------------

    Resolution: Not A Problem

bq. If you are using the row-key, you must duplicate the bytebuffer, otherwise 
the RowIterator in ColumnFamilyRecordReader does not finish correctly.

Yes, it's basic hygiene not to modify the keys out from under the mapper, or 
more generally, "Don't mutate method parameters since the caller may still be 
using them."

We don't duplicate everything defensively just in case you do something you 
shouldn't, since the performance hit is too high.  So if you need to mutate you 
need to perform the copy yourself.  Better: use the positional get methods of 
BB instead of the mutating ones.
                
> Infinite MapReduce Task while reading via ColumnFamilyInputFormat
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-4229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4229
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1.0
>         Environment: Debian Squeeze
>            Reporter: bert Passek
>         Attachments: screenshot.jpg
>
>
> Hi,
> we recently upgraded cassandra from version 1.0.9 to 1.1.0. After that we can 
> not execute any hadoop jobs which reads data from cassandra via 
> ColumnFamilyInputFormat.
> A map task is created which is running infinitely. We are trying to read from 
> a super column family with more or less 1000 row keys.
> This is the output from job interface where we already have 17 million map 
> input records !!!
> Map input records     17.273.127      0       17.273.127
> Reduce shuffle bytes  0       391     391
> Spilled Records       3.288   0       3.288
> Map output bytes      639.849.351     0       639.849.351
> CPU time spent (ms)   792.750         7.600   800.350
> Total committed heap usage (bytes)    354.680.832     48.955.392      
> 403.636.224
> Combine input records         17.039.783      0       17.039.783
> SPLIT_RAW_BYTES       212     0       212
> Reduce input records  0       0       0
> Reduce input groups   0       0       0
> Combine output records        3.288   0       3.288
> Physical memory (bytes) snapshot      510.275.584     96.370.688      
> 606.646.272
> Reduce output records         0       0       0
> Virtual memory (bytes) snapshot       1.826.496.512   934.473.728     
> 2.760.970.240
> Map output records    17.273.126      0       17.273.126
> We must kill the job and we have to go back to version 1.0.9 because 1.1.0 is 
> not usable for reading from cassandra.
> Best regards 
> Bert Passek

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-4229) Infinite MapReduce Task while reading via ColumnFamilyInputFormat

Reply via email to