On Oct 9, 2006, at 2:54 AM, 张茂森 wrote:

Hi all:

I’m trying to use hadoop to process logs. I’ve write some routine to count the login times of the same ip. However, because my logs’ characters are hybrid encoded (ASCII, Unicode, UTF-8 etc), TextInputFormat class in hadoop
will error. Do you have some good way to solve this problem?

In Hadoop 0.7.0, we disabled the exception when bad UTF8 is given to the Text object. In the longer term we will re-enable validation but have support for new-line separated binary data, which is what you have. *smile*

-- Owen

Reply via email to