[ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704617#action_12704617 ]
Zheng Shao commented on HIVE-352: --------------------------------- hive-352-2009-4-30-4.patch: Thanks Yongqiang. I tried it and it works now. * exec/Utilities.java:createRCFileWriter: parameter fs never used. * RCFile.java: Can you add some comments in the header to explain how you determine "row split"? What does this mean? <li>A sync-marker every few <code>100</code> bytes or so.</li> KeyBuffer.write(DataInput in, int length): length never used. Is this method ever used? If not, remove it. ColumnBuffer.getLength(): Let's remove the synchronized keyword for the sake of performance. ColumnBuffer.ColumnBuffer(Class valClass): Let's remove valClass since it's never used. Writer: how do you pass the column number from Hive to the configuration and then to the RCFIle.Writer? + this.columnNumber = conf.getInt(COLUMN_NUMBER_CONF_STR, 0); Writer: Per Java convention, COLUMNS_BUFFER_SIZE should be columnsBufferSize. + this.COLUMNS_BUFFER_SIZE = conf.getInt(COLUMNS_BUFFER_SIZE_CONF_STR, + 4 * 1024 * 1024); Reader: add javadoc for "public synchronized boolean nextColumnsBatch()" * RCFileRecordReader.java Remove "synchronized" in all methods for the sake of performance + protected synchronized boolean next(LongWritable key) throws IOException { * TestRCFile.java You might want to use Text.encode() instead of "String".getBytes(). At least we should use getBytes("UTF-8"). * ql/src/test/queries/clientpositive/columnarserde_create_shortcut.q For all tests, please drop the created tables again at the end of the .q file. This helps to make "show tables" in other .q files return a deterministic result. * BytesRefArrayWritable.java What's the difference between get() and unCheckedGet()? The code is the same BytesRefArrayWritable.resetValid: Let's call it resize. set(..): should be valid <= index + if (valid < index) + valid = index + 1; * BytesRefWritable.java Shall we rename getBytes() to getBytesCopy()? * ColumnarStruct.java init(...): Instead of clearing out the object, we can do set(null,0,0) which has the same effect but we don't need to create LazyObject for the next row. {code} Current: + if (field.length > 0) { + if (fields[fieldIndex] == null) + fields[fieldIndex] = LazyFactory.createLazyObject(fieldTypeInfos + .get(fieldIndex)); + fields[fieldIndex].init(cachedByteArrayRef[fieldIndex], field + .getStart(), field.getLength()); + } else if (fields[fieldIndex] != null + && fields[fieldIndex].getObject() != null) { + fields[fieldIndex] = null; + } Let's change it to (Please have a test) + if (fields[fieldIndex] == null) + fields[fieldIndex] = LazyFactory.createLazyObject(fieldTypeInfos + .get(fieldIndex)); + if (field.length > 0) { + fields[fieldIndex].init(cachedByteArrayRef[fieldIndex], field + .getStart(), field.getLength()); + } else { + fields[fieldIndex].init(null, 0, 0); + } * Can you add one more test for adding/deleting columns? The value of the newly added columns should be NULL. RCFile should also be able to ignore the extra columns from the data. {code} CREAT TABLE xxx (a string, b string) STORED AS RCFILE; INSERT OVERWRITE TABLE xxx ...; ALTER TABLE xxx ADD COLUMNS (c string); SELECT * FROM xxx; ALTER TABLE xxx REPLACE COLUMNS (a string); SELECT * FROM xxx; {code} * General comments: 1 We don't need to put "this." in the code unless there is a local var with the same name. 2 Please add more javadoc - the general rule for Hive is that every public class/method should have javadoc, except getters and setters. ** Very important: Eclipse automatically generates a lot of empty javadocs for variables etc. Please remove them if you don't have a comment for that variable, otherwise "ant -Dhadoop.version=0.17.0 javadoc" will have warnings. Please make sure to remove all warnings. {code} [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/serde/src/java/org/apache/hadoop/hive/serde2/columnar/BytesRefWritable.java:95: warning - @return tag has no arguments. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/serde/src/java/org/apache/hadoop/hive/serde2/columnar/BytesRefWritable.java:122: warning - Tag @see: missing '#': "set(byte[] newData, int offset, int length)" [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/serde/src/java/org/apache/hadoop/hive/serde2/columnar/BytesRefWritable.java:105: warning - Tag @see: missing '#': "readFields(DataInput in)" [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java:38: warning - Tag @link: missing '#': "init(BytesRefArrayWritable cols)" [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:522: warning - @param argument "keyClass" is not a parameter name. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:522: warning - @param argument "valClass" is not a parameter name. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:149: warning - Tag @see: malformed: "CompressionCodec, SequenceFile" [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:149: warning - Tag @see: reference not found: CompressionCodec, SequenceFile [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1214: warning - @return tag has no arguments. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1214: warning - Tag @link: missing '#': "nextColumnBatch()" [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1214: warning - Tag @link: can't find nextColumnBatch() in org.apache.hadoop.hive.ql.io.RCFile.Reader [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1288: warning - @return tag has no arguments. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1288: warning - @return tag cannot be used in method with void return type. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:543: warning - @param argument "keyClass" is not a parameter name. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:543: warning - @param argument "valClass" is not a parameter name. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:564: warning - @param argument "keyClass" is not a parameter name. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:564: warning - @param argument "valClass" is not a parameter name. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:591: warning - @param argument "keyClass" is not a parameter name. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:591: warning - @param argument "valClass" is not a parameter name. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java:105: warning - @return tag has no arguments. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java:105: warning - @param argument "tableInfo" is not a parameter name. [javadoc] /data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:745: warning - @param argument "dest" is not a parameter name. {code} > Make Hive support column based storage > -------------------------------------- > > Key: HIVE-352 > URL: https://issues.apache.org/jira/browse/HIVE-352 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: He Yongqiang > Assignee: He Yongqiang > Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22 > progress.txt, hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, > hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, > hive-352-2009-4-22-2.patch, hive-352-2009-4-22.patch, > hive-352-2009-4-23.patch, hive-352-2009-4-27.patch, > hive-352-2009-4-30-2.patch, hive-352-2009-4-30-3.patch, > hive-352-2009-4-30-4.patch, HIve-352-draft-2009-03-28.patch, > Hive-352-draft-2009-03-30.patch > > > column based storage has been proven a better storage layout for OLAP. > Hive does a great job on raw row oriented storage. In this issue, we will > enhance hive to support column based storage. > Acctually we have done some work on column based storage on top of hdfs, i > think it will need some review and refactoring to port it to Hive. > Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.