[ 
https://issues.apache.org/jira/browse/BEAM-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342456#comment-16342456
 ] 

Nikolay Sokolov commented on BEAM-995:
--------------------------------------

[~xumingming] I'm new to beam, but currently working on data warehouse project 
which heavily relied on pig in the past. We are quite interested in possibility 
to run that legacy on Dataflow via Beam without major overhaul, so here are my 
few humble comments on this topic:

> If we do pig-on-beam on beam-side, we will have something like `UDFAdapter` 
>which will adapt all existing UDFs, so we can use them in the new pig-on-beam.

It feels like pig is not so popular nowadays, from other hand there is 
humongous amount of legacy code across many organizations, where full pig 
compatibility would be required. Existing code frequently depends on a way how 
pig discovers additional jars, specific Loaders/Storers (custom ones also might 
be possible), and shell command arguments of pig command itself. For such 
legacy codebases, pig-on-beam would be more benefitial.

> There is pipeline optimizer in BEAM, and also an optimizer in underline 
>engine(Spark, MapReduce)

I'm not particularly sure about pig side of things, but hive provides 
optimizations such as map joins, sorted bucketed joins, and skewed joins, on 
logical plan level. Some of these optimizations require knowledge of metadata 
(for example, in HCat case). Would optimizers on beam side cover those cases?

> Apache Pig DSL
> --------------
>
>                 Key: BEAM-995
>                 URL: https://issues.apache.org/jira/browse/BEAM-995
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>
> Apache Pig is still popular and the language is not so large.
> Providing a DSL using the Pig language would potentially allow more people to 
> use Beam (at least during a transition period).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to