Serialization itself has no effect on accuracy; doubles are encoded exactly as they are in memory. That's not to say that there may be an accuracy issue in how some computation proceeds, but it is not a function of serialization.
On Thu, Jul 14, 2011 at 2:54 PM, Dhruv Kumar <[email protected]> wrote: > What are the algorithms and codecs used in Hadoop to compress data and pass > it around between mappers and reducers? I'm curious to understand the > effects it has (if any) on double precision values. > > So far my trainer (MAHOUT-627) uses unscaled EM training and I'm soon > starting the work on using log-scaled values for improved accuracy and > minimizing underflow. It will be interesting to compare the accuracy of the > unscaled and log scaled variants so I'm curious. >
