RCFile issues
-------------
Key: HIVE-2065
URL: https://issues.apache.org/jira/browse/HIVE-2065
Project: Hive
Issue Type: Bug
Reporter: Krishna Kumar
Priority: Minor
Some potential issues with RCFile
1. Remove unwanted synchronized modifiers on the methods of RCFile. As per
yongqiang he, the class is not meant to be thread-safe (and it is not). Might
as well get rid of the confusing and performance-impacting lock acquisitions.
2. Record Length overstated for compressed files. IIUC, the key compression
happens after we have written the record length.
{code}
int keyLength = key.getSize();
if (keyLength < 0) {
throw new IOException("negative length keys not allowed: " + key);
}
out.writeInt(keyLength + valueLength); // total record length
out.writeInt(keyLength); // key portion length
if (!isCompressed()) {
out.writeInt(keyLength);
key.write(out); // key
} else {
keyCompressionBuffer.reset();
keyDeflateFilter.resetState();
key.write(keyDeflateOut);
keyDeflateOut.flush();
keyDeflateFilter.finish();
int compressedKeyLen = keyCompressionBuffer.getLength();
out.writeInt(compressedKeyLen);
out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
}
{code}
3. For sequence file compatibility, the compressed key length should be the
next field to record length, not the uncompressed key length.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira