[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

Sergey Shelukhin (JIRA) Fri, 26 May 2017 18:23:44 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027103#comment-16027103
 ]


Sergey Shelukhin commented on HIVE-16761:
-----------------------------------------

The call is in next
{noformat}
nextValue(batch.cols[i], rowInBatch, schema.get(i), getStructCol(value, i)))
{noformat}
Schema is created from the vrbCtx
{noformat}
schema = Lists.<TypeInfo>newArrayList(vrbCtx.getRowColumnTypeInfos());
{noformat}
The ctx is the same one passed from the LlapReader... created via 
"LlapInputFormat.createFakeVrbCtx(mapWork);" for the non-vectorized map work 
case, as I assume is the case here.
I suspect the problem is that the latter is incorrect for this case.
{noformat}
static VectorizedRowBatchCtx createFakeVrbCtx(MapWork mapWork) throws 
HiveException {
    // This is based on Vectorizer code, minus the validation.

    // Add all non-virtual columns from the TableScan operator.
    RowSchema rowSchema = findTsOp(mapWork).getSchema();
    final List<String> colNames = new 
ArrayList<String>(rowSchema.getSignature().size());
    final List<TypeInfo> colTypes = new 
ArrayList<TypeInfo>(rowSchema.getSignature().size());
    for (ColumnInfo c : rowSchema.getSignature()) {
      String columnName = c.getInternalName();
      if (VirtualColumn.VIRTUAL_COLUMN_NAMES.contains(columnName)) continue;
      colNames.add(columnName);
      colTypes.add(TypeInfoUtils.getTypeInfoFromTypeString(c.getTypeName()));
    }

    // Determine the partition columns using the first partition descriptor.
    // Note - like vectorizer, this assumes partition columns go after data 
columns.
    int partitionColumnCount = 0;
    Iterator<Path> paths = mapWork.getPathToAliases().keySet().iterator();
    if (paths.hasNext()) {
      PartitionDesc partDesc = 
mapWork.getPathToPartitionInfo().get(paths.next());
      if (partDesc != null) {
        LinkedHashMap<String, String> partSpec = partDesc.getPartSpec();
        if (partSpec != null && partSpec.isEmpty()) {
          partitionColumnCount = partSpec.size();
        }
      }
    }
    return new VectorizedRowBatchCtx(colNames.toArray(new 
String[colNames.size()]),
        colTypes.toArray(new TypeInfo[colTypes.size()]), null, 
partitionColumnCount, new String[0]);
  }
{noformat}
[~jdere] [~gopalv] does SMB join do something special wrt columns?
Also, I see a bug right there with partition column count. I wonder if that 
could be related...


> LLAP IO: SMB joins fail elevator 
> ---------------------------------
>
>                 Key: HIVE-16761
>                 URL: https://issues.apache.org/jira/browse/HIVE-16761
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gopal V
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>       at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>       at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>       at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>       ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>       at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>       at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>       at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>       ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

Reply via email to