On Oct 9, 2006, at 2:54 AM, 张茂森 wrote:
Hi all:
I’m trying to use hadoop to process logs. I’ve write some routine
to count
the login times of the same ip. However, because my logs’
characters are
hybrid encoded (ASCII, Unicode, UTF-8 etc), TextInputFormat class
in hadoop
will error. Do you have some good way to solve this problem?
In Hadoop 0.7.0, we disabled the exception when bad UTF8 is given to
the Text object. In the longer term we will re-enable validation but
have support for new-line separated binary data, which is what you
have. *smile*
-- Owen