[jira] [Commented] (CALCITE-3122) Convert Pig Latin scripts into Calcite logical plan

Khai Tran (JIRA) Mon, 15 Jul 2019 11:09:03 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885464#comment-16885464
 ]


Khai Tran commented on CALCITE-3122:
------------------------------------

Hi [~zabetak], thanks a lot for your suggestions.

In this feature, I get the Pig logical plan from Pig parser then traverse 
through this plan, use RelBuilder to construct Calcite logical plan, let's call 
it plan #1. After this step, I need to write an additional rule to optimize 
plan #1 into plan #2. As an example, Pig group-by + aggregate is literally 
translated as a Calcite aggregate with COLLECT() agg func, then applying a 
Project that use Pig aggregate UDFs (work on Pig DataBag or SQL multiset, which 
is result of COLLECT()). So the rule will convert COLLECT() + Pig aggregate 
UDFs into Calcite builtin aggregate operator.

And we have other use cases that need to optimize a RelNode plan using a given 
set of rules and probably using a customer planner to set costs in the way that 
can enforce certain rules. So coupling RelNode with planners make it hard to do 
so.

I will try to check out more about HepPlanner, but do you have any example of 
setting HepPlanner for RelBuilder?

> Convert Pig Latin scripts into Calcite logical plan 
> ----------------------------------------------------
>
>                 Key: CALCITE-3122
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3122
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core, piglet
>            Reporter: Khai Tran
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.21.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We create an internal Calcite repo at LinkedIn and develop APIs to parse any 
> Pig Latin scripts into Calcite logical plan. The code was tested in nearly 
> ~1000 Pig scripts written at LinkedIn.
> Changes:
> 1. piglet: main conversion code live there, include:
>  * APIs to convert any Pig scripts into RelNode plans or SQL statements
>  * Use Pig Grunt parser to parse Pig Latin scripts into Pig logical plan 
> (DAGs)
>  * Convert Pig schemas into RelDatatype
>  * Traverse through Pig expression plan and convert Pig expressions into 
> RexNodes
>  * Map some basic Pig UDFs to Calcite SQL operators
>  * Build Calcite UDFs for any other Pig UDFs, including UDFs written in both 
> Java and Python
>  * Traverse (DFS) through Pig logical plans to convert each Pig logical nodes 
> to RelNodes
>  * Have an optimizer rule to optimize Pig group/cogroup into Aggregate 
> operators
> 2. core:
>  * Implement other RelNode in Rel2Sql so that Pig can be translated into SQL
>  * Other minor changes in a few other classes to make Pig to Calcite works



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (CALCITE-3122) Convert Pig Latin scripts into Calcite logical plan

Reply via email to