Steven Ruppert created CRUNCH-603:
-------------------------------------
Summary: Cache constituent Writables inside TupleWritable
`readField` call
Key: CRUNCH-603
URL: https://issues.apache.org/jira/browse/CRUNCH-603
Project: Crunch
Issue Type: Improvement
Components: Core
Affects Versions: 0.13.0
Reporter: Steven Ruppert
Assignee: Josh Wills
Priority: Minor
Currently, `TupleWritable.readFields` will, for every field in the tuple,
create a new Writable of that field type using reflection
(`WritableFactories.newInstance`), through `TupleWritable.getWritable`, in
order to deserialize that field. This burns up an unfortunate amount of CPU
time.
I've got a patch for this that caches the writables to be reused (just as the
TupleWritable itself is reused throughout hadoop). It appears to work, at least
for our cases. I think it will break if you ever have heterogenous tuple
types, but that seems like a bad idea, if not already proscribed in the
documentation somewhere.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)