Rohini Palaniswamy commented on PIG-5283:

bq. My only question is that if we should only write those properties that are 
required for a PigSplit instead of writing the full jobConf (6-700 entries) for 
  I would suggest trimming down and also see if it is possible to serialize 
only once. You are serializing the config with each split which is not good. 
That is a lot of overhead and will impact performance. Had run into performance 
issues and OOMs with Tez on huge configs and serializing configs multiple times 
and had to trim down.

> Configuration is not passed to SparkPigSplits on the backend
> ------------------------------------------------------------
>                 Key: PIG-5283
>                 URL: https://issues.apache.org/jira/browse/PIG-5283
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>         Attachments: PIG-5283.0.patch
> When a Hadoop ObjectWritable is created during a Spark job, the instantiated 
> PigSplit (wrapped into a SparkPigSplit) is given an empty Configuration 
> instance.
> This happens 
> [here|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SerializableWritable.scala#L44]

This message was sent by Atlassian JIRA

Reply via email to