[ https://issues.apache.org/jira/browse/CRUNCH-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Micah Whitacre resolved CRUNCH-604. ----------------------------------- Resolution: Fixed Fix Version/s: 0.15.0 Thanks for the validation and +1. Change has been pushed to master. > Avoid expensive Writables.reloadWritableComparableCodes where possible > ---------------------------------------------------------------------- > > Key: CRUNCH-604 > URL: https://issues.apache.org/jira/browse/CRUNCH-604 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.13.0 > Reporter: Steven Ruppert > Assignee: Micah Whitacre > Fix For: 0.15.0 > > Attachments: > 0001-TupleWritable-only-reload-codes-once-on-setConf.patch, > 0001-Writables-cache-reloadWritableComparables-when-it-ha.patch, > CRUNCH-604.patch > > > Every time `setConf` is called on TupleWritable, > `Writables.reloadWritableComparableCodes(conf)` is called. Unfortunately, > `SequenceFile$Reader.readValue` calls `setConf` every single time. This burns > a regrettable amount of CPU time. > Attached is a patch that prevents a given TupleWritable instance from > reloading the code more than once, as well as a patch to cache > (hashCode-wise) reading from the actual hadoop config, which has to run > regexes and stuff. I can construe situations where this would break (somehow, > you modify the configuration in between reading to two values), but nothing > actually sane comes to mind. -- This message was sent by Atlassian JIRA (v6.3.4#6332)