Hi all, It seems that hive would go wrong when storing unicode strings. Hive use byte comparision for delimiting fields of a record( see LazyStruct.java:92, a parse method). If we use gbk or utf-8 encoding where characters would need more than 1 byte, might 2-3 bytes, then it would by coincidence seperator for delimiting fields equal one of byte in our gbk/utf-8 encoding character. thus things go wrong. Can hive solve the problem above?
Thanks, Min -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
