Re: error about character set(ASCII, UTF-8, Unicode) using TextInputFormat

Owen O'Malley Mon, 09 Oct 2006 09:38:20 -0700


On Oct 9, 2006, at 2:54 AM, 张茂森 wrote:

Hi all:
I’m trying to use hadoop to process logs. I’ve write some routineto countthe login times of the same ip. However, because my logs’characters arehybrid encoded (ASCII, Unicode, UTF-8 etc), TextInputFormat classin hadoop
will error. Do you have some good way to solve this problem?

In Hadoop 0.7.0, we disabled the exception when bad UTF8 is given tothe Text object. In the longer term we will re-enable validation buthave support for new-line separated binary data, which is what youhave. *smile*


-- Owen

Re: error about character set(ASCII, UTF-8, Unicode) using TextInputFormat

Reply via email to