[ 
https://issues.apache.org/jira/browse/IMPALA-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384747#comment-17384747
 ] 

pengdou1990 commented on IMPALA-10785:
--------------------------------------

Yes, if hdfs operand is the first operand, hdfs operand can still be passed 
through, but it will be better if both kudu operand and hdfs oprerand can be 
passed through.

In my solution, I traverse the union child operands, and if there is any kudu 
oprands,each string slot in hdfs oprand will pad 4 byte, and each string slot 
in union output also will pad 4 byte, thus both kudu and hdfs operand memory 
layout(including slotsize and slot offset) are equals with union output.

I have tried to mark kudu's primary keys nullable as well, it seems effective.

besides, I also try to rearrange each kudu row's directdata without string 
padding as your suggestion, it's performance is not so good.

> when union kudu table and hdfs table, union passthrough does not take effect
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-10785
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10785
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: pengdou1990
>            Priority: Major
>
> IMPALA-3586 already supports union passthrough, and brings great performance 
> improvements in union, but there is still some problems when union between 
> hdfs table and kudu table ,several points cause the problem:
>  # in kudu scanner node output TupleDescriptor, string slot is 16B,while in 
> hdfs scanner node output TupleDescriptor, string slot is 12B,cause tuple 
> memory layout mismatch
>  # in kudu scanner node output TupleDescriptor, string slot is 16B, while in 
> Union output TupleDescriptor, string slot is 12B,cause tuple memory layout 
> mismatch
>  # in Kudu Scannode, row key slot is not null, while in hdfs node, not null 
> slot can't get from the metadata, cause tuple memory layout mismatch
> I hive resolved the 1st and 2nd points, how should I do with the 3rd point?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to