Rohini Palaniswamy created PIG-4697:
---------------------------------------
Summary: Pig needs to serialize only part of the udfcontext for
each vertex
Key: PIG-4697
URL: https://issues.apache.org/jira/browse/PIG-4697
Project: Pig
Issue Type: Improvement
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
Fix For: 0.16.0
What HCatLoader/HCatStorer put in UDFContext is huge and if there are
multiple of them in the pig script, the size of data sent to Tez AM is huge and
the size of data that Tez AM to tasks is huge and causing either RPC limit
exceeded or OOM issues. If Pig serializes only part of the udfcontext that is
required for each vertex, it will save a lot. HCat folks are also looking up
at cleaning what goes into the conf (it ends up serializing whole job conf, not
just hive-site.xml) and moving out the common part to be shared by all hcat
loaders and stores.
Also looking at other options for faster and compact serialization. Will create
separate jiras for that. Will use PIG-4653 to cleanup all other pig config
other than udfcontext.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)