[ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706089#comment-13706089
 ] 

Jitendra Nath Pandey commented on HIVE-4160:
--------------------------------------------

Dmitry, Vinod
  There is significant amount of vectorization work in expression evaluation 
for example, arithmetic expressions or logical expressions or aggregations etc. 
Many of these expressions are pretty generic and different systems are likely 
to have similar semantics for these. It should be possible to re-use this code 
with little change in pig or other systems. It will be required to use same 
vectorized representation of data in the processing engine to re-use these 
expressions, but that part of code is also generic and re-usable. I think that 
could be a good starting point.
  However, a bunch of the vectorization work is in operator code where we have 
vectorized version of the hive operators. These operators are closely tied with 
hive semantics and implementation. Therefore, it will need some restructuring 
in hive code base as well to generalize these operators for re-use in other 
projects. Also, at this point we should be thinking more generally about a 
common physical layer shared between pig and hive. These languages can continue 
to have different logical plans but it would be desirable that they share 
common physical plan structure because they both use same map-reduce runtime.
                
> Vectorized Query Execution in Hive
> ----------------------------------
>
>                 Key: HIVE-4160
>                 URL: https://issues.apache.org/jira/browse/HIVE-4160
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: Hive-Vectorized-Query-Execution-Design.docx, 
> Hive-Vectorized-Query-Execution-Design-rev2.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev4.docx, 
> Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev5.docx, 
> Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev6.docx, 
> Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev7.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev9.docx, 
> Hive-Vectorized-Query-Execution-Design-rev9.pdf
>
>
> The Hive query execution engine currently processes one row at a time. A 
> single row of data goes through all the operators before the next row can be 
> processed. This mode of processing is very inefficient in terms of CPU usage. 
> Research has demonstrated that this yields very low instructions per cycle 
> [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
> and data columns go through a layer of object inspectors that identify column 
> type, deserialize data and determine appropriate expression routines in the 
> inner loop. These layers of virtual method calls further slow down the 
> processing. 
> This work will add support for vectorized query execution to Hive, where, 
> instead of individual rows, batches of about a thousand rows at a time are 
> processed. Each column in the batch is represented as a vector of a primitive 
> data type. The inner loop of execution scans these vectors very fast, 
> avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
> substantially reduces CPU time used, and gives excellent instructions per 
> cycle (i.e. improved processor pipeline utilization). See the attached design 
> specification for more details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to