[
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359940#comment-15359940
]
Nemon Lou commented on HIVE-14143:
----------------------------------
[~pxiong] Thanks for your attention.
RawDataSize for rcfile is a summary size of the total selected columns.
https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java#L229
{code}
public long getRawDataSerializedSize() {
long serializedSize = 0;
for (int i = 0; i < fieldInfoList.length; ++i) {
serializedSize += fieldInfoList[i].getSerializedSize();
}
return serializedSize;
}
{code}
During projections push down,READ_ALL_COLUMNS is always set to false,no matter
the specified columns are empty or not.
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L656
{code}
for (String alias : aliases) {
Operator<? extends OperatorDesc> op = this.mrwork.getAliasToWork().get(
alias);
if (op instanceof TableScanOperator) {
TableScanOperator ts = (TableScanOperator) op;
// push down projections.
ColumnProjectionUtils.appendReadColumns(
jobConf, ts.getNeededColumnIDs(), ts.getNeededColumns());
// push down filters
pushFilters(jobConf, ts);
AcidUtils.setTransactionalTableScan(job, ts.getConf().isAcidTable());
}
}
{code}
The specified column ids are empty for analyze,which means read all columns.
Finally, no column is read :
https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java#L104
{code}
List<Integer> notSkipIDs = new ArrayList<Integer>();
if (conf == null || ColumnProjectionUtils.isReadAllColumns(conf)) {
for (int i = 0; i < size; i++ ) {
notSkipIDs.add(i);
}
} else {
notSkipIDs = ColumnProjectionUtils.getReadColumnIDs(conf);
}
cachedLazyStruct = new ColumnarStruct(
cachedObjectInspector, notSkipIDs, serdeParams.getNullSequence());
{code}
> RawDataSize of RCFile is zero after analyze
> --------------------------------------------
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
> Issue Type: Bug
> Components: Statistics
> Affects Versions: 1.2.1, 2.1.0
> Reporter: Nemon Lou
> Assignee: Nemon Lou
> Priority: Minor
> Attachments: HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for
> rcfile tables.
> {noformat}
> analyze table RCFILE_TABLE compute statistics ;
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)