Hi all,
It seems that hive would go wrong when storing unicode strings. Hive use
byte comparision for delimiting fields of a record(
see  LazyStruct.java:92, a parse method).
If we use gbk or utf-8 encoding where characters would need more than 1
byte, might 2-3 bytes,  then it would by coincidence seperator for
delimiting fields equal one of byte in our gbk/utf-8 encoding character.
thus things go wrong.
Can hive solve the problem above?

Thanks,
Min
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Reply via email to