Text.toString violates its abstraction --------------------------------------
Key: HADOOP-6883 URL: https://issues.apache.org/jira/browse/HADOOP-6883 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: Linux Reporter: Gordon Sommers I stumbled upon this when encoding a google protocol buffer in base64, and storing it in a Text object for serialization. Compare the following two lines: byte [] decoded = b64.decode(val.getBytes()) //this does not return the same bytes as below and the result, after decoding the base64 successfully, is a very mangled protocol buffer byte [] decoded = b64.decode(val.toString().getBytes()); //YES, toString() FIXES IT Elsewhere in my code I also have: Text curline = new Text(values.next().toString()); byte [] raw = base64.decode(curline.getBytes()); //This does work. It looks like the Text object must be toString'd (just once, somewhere, even if its later repacked in a Text) before it will have the proper byte representation. I would classify this as a leaky abstraction and ask that the reason please be isolated and the api fixed somehow so that other developers dont have to spend 3 days figuring out when Text.getBytes isn't returning the right bytes even though Text.toString prints exactly the right string representation and Text.toString.getBytes does return the right bytes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.