[
https://issues.apache.org/jira/browse/PIG-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974783#comment-14974783
]
Daniel Dai commented on PIG-4697:
---------------------------------
Makes sense. +1.
> Serialize relevant part of the udfcontext per vertex to reduce payload size
> ---------------------------------------------------------------------------
>
> Key: PIG-4697
> URL: https://issues.apache.org/jira/browse/PIG-4697
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4697-1.patch, PIG-4697-2.patch,
> PIG-4697-fixunittests.patch
>
>
> What HCatLoader/HCatStorer puts in UDFContext is huge and if there are
> multiple of them in the pig script, the size of data sent to Tez AM is huge
> and also the size of data that Tez AM sends to tasks is huge causing RPC
> limit exceeded and OOM issues respectively. If Pig serializes only part of
> the udfcontext that is required for each vertex, it will save a lot. HCat
> folks are also looking up at cleaning what goes into the conf (it ends up
> serializing whole job conf, not just hive-site.xml) and moving out the common
> part to be shared by all hcat loaders and stores.
> Also looking at other options for faster and compact serialization. Will
> create separate jiras for that. Will use PIG-4653 to cleanup all other pig
> config other than udfcontext.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)