Re: HashMap which can spill to disk for Hadoop?

2007-12-26 Thread Ted Dunning
Sounds much better to me. On 12/26/07 7:53 AM, Eric Baldeschwieler [EMAIL PROTECTED] wrote: With a secondary sort on the values during the shuffle, nothing would need to be kept in memory, since it could all be counted in a single scan. Right? Wouldn't that be a much more efficient

RE: HashMap which can spill to disk for Hadoop?

2007-12-19 Thread Jim Kellerman
Have you looked at hadoop.io.MapWritable? --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: C G [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 19, 2007 11:59 AM To: hadoop-user@lucene.apache.org Subject: HashMap which can spill to disk for Hadoop? Hi All:

Re: HashMap which can spill to disk for Hadoop?

2007-12-19 Thread Ted Dunning
You should also be able get quite a bit of mileage out of special purpose HashMaps. In general, java generic collections incur large to huge penalties for certain special cases. If you have one of these special cases or can put up with one, then you may be able to get 1+ order of magnitude