[
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027103#comment-16027103
]
Sergey Shelukhin commented on HIVE-16761:
-----------------------------------------
The call is in next
{noformat}
nextValue(batch.cols[i], rowInBatch, schema.get(i), getStructCol(value, i)))
{noformat}
Schema is created from the vrbCtx
{noformat}
schema = Lists.<TypeInfo>newArrayList(vrbCtx.getRowColumnTypeInfos());
{noformat}
The ctx is the same one passed from the LlapReader... created via
"LlapInputFormat.createFakeVrbCtx(mapWork);" for the non-vectorized map work
case, as I assume is the case here.
I suspect the problem is that the latter is incorrect for this case.
{noformat}
static VectorizedRowBatchCtx createFakeVrbCtx(MapWork mapWork) throws
HiveException {
// This is based on Vectorizer code, minus the validation.
// Add all non-virtual columns from the TableScan operator.
RowSchema rowSchema = findTsOp(mapWork).getSchema();
final List<String> colNames = new
ArrayList<String>(rowSchema.getSignature().size());
final List<TypeInfo> colTypes = new
ArrayList<TypeInfo>(rowSchema.getSignature().size());
for (ColumnInfo c : rowSchema.getSignature()) {
String columnName = c.getInternalName();
if (VirtualColumn.VIRTUAL_COLUMN_NAMES.contains(columnName)) continue;
colNames.add(columnName);
colTypes.add(TypeInfoUtils.getTypeInfoFromTypeString(c.getTypeName()));
}
// Determine the partition columns using the first partition descriptor.
// Note - like vectorizer, this assumes partition columns go after data
columns.
int partitionColumnCount = 0;
Iterator<Path> paths = mapWork.getPathToAliases().keySet().iterator();
if (paths.hasNext()) {
PartitionDesc partDesc =
mapWork.getPathToPartitionInfo().get(paths.next());
if (partDesc != null) {
LinkedHashMap<String, String> partSpec = partDesc.getPartSpec();
if (partSpec != null && partSpec.isEmpty()) {
partitionColumnCount = partSpec.size();
}
}
}
return new VectorizedRowBatchCtx(colNames.toArray(new
String[colNames.size()]),
colTypes.toArray(new TypeInfo[colTypes.size()]), null,
partitionColumnCount, new String[0]);
}
{noformat}
[~jdere] [~gopalv] does SMB join do something special wrt columns?
Also, I see a bug right there with partition column count. I wonder if that
could be related...
> LLAP IO: SMB joins fail elevator
> ---------------------------------
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
> Issue Type: Bug
> Reporter: Gopal V
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException:
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
> at
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
> at
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
> ... 26 more
> Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
> at
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
> at
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
> at
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
> ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join
> customer_accounts_orc_200 b on a.account_id=b.account_id group by
> year,quarter;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)