Re: Hadoop serialization compression and precision loss

Sean Owen Thu, 14 Jul 2011 07:29:58 -0700

Serialization itself has no effect on accuracy; doubles are encoded exactly
as they are in memory.
That's not to say that there may be an accuracy issue in how some
computation proceeds, but it is not a function of serialization.


On Thu, Jul 14, 2011 at 2:54 PM, Dhruv Kumar <[email protected]> wrote:

> What are the algorithms and codecs used in Hadoop to compress data and pass
> it around between mappers and reducers? I'm curious to understand the
> effects it has (if any) on double precision values.
>
> So far my trainer (MAHOUT-627) uses unscaled EM training and I'm soon
> starting the work on using log-scaled values for improved accuracy and
> minimizing underflow. It will be interesting to compare the accuracy of the
> unscaled and log scaled variants so I'm curious.
>

Re: Hadoop serialization compression and precision loss

Reply via email to