[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008741#comment-13008741 ]
Krishna Kumar commented on HIVE-2065: ------------------------------------- So should I go ahead and fix #2 and #3 as well? Note that these are non-compatible changes, so the version number will need to be bumped up. My proposal: Fix the issues in the new format - up the version number to 7. - compute and store record length as (compressed key length = 4 + compressed key contents length) + compressed value length - store compressed key length as the next 4-byte field - key contains 4-byte uncompressed key contents length + compressed key contents Provide backward compatibility - while reading version 6, - interpret fields as now but recalculate the recordlength from the next two fields (as record length = record length - uncompressed key length + compressed key length) > RCFile issues > ------------- > > Key: HIVE-2065 > URL: https://issues.apache.org/jira/browse/HIVE-2065 > Project: Hive > Issue Type: Bug > Reporter: Krishna Kumar > Assignee: Krishna Kumar > Priority: Minor > Attachments: Slide1.png > > > Some potential issues with RCFile > 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per > yongqiang he, the class is not meant to be thread-safe (and it is not). Might > as well get rid of the confusing and performance-impacting lock acquisitions. > 2. Record Length overstated for compressed files. IIUC, the key compression > happens after we have written the record length. > {code} > int keyLength = key.getSize(); > if (keyLength < 0) { > throw new IOException("negative length keys not allowed: " + key); > } > out.writeInt(keyLength + valueLength); // total record length > out.writeInt(keyLength); // key portion length > if (!isCompressed()) { > out.writeInt(keyLength); > key.write(out); // key > } else { > keyCompressionBuffer.reset(); > keyDeflateFilter.resetState(); > key.write(keyDeflateOut); > keyDeflateOut.flush(); > keyDeflateFilter.finish(); > int compressedKeyLen = keyCompressionBuffer.getLength(); > out.writeInt(compressedKeyLen); > out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); > } > {code} > 3. For sequence file compatibility, the compressed key length should be the > next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira