Adam Szita commented on PIG-5283:

Attached [^PIG-5283.1.patch] with the feature of only writing out the necessary 
keys of the configuration.
Unfortunately I don't see any way to write the config only once (instead of per 
split), as I need to have it ready at the very first stages of Spark task 
execution: the deseralization of the task.

[~kellyzly] yes, the PigInputFormatSpark#createRecordReader part comes much 
later in the execution, and will work just like before. In a way it is 
irrelevant of the current issue, because it will set the full configuration on 
each split, but it's too late for this issue since we need the configuration 
during task deseralization time already.

> Configuration is not passed to SparkPigSplits on the backend
> ------------------------------------------------------------
>                 Key: PIG-5283
>                 URL: https://issues.apache.org/jira/browse/PIG-5283
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>         Attachments: PIG-5283.0.patch, PIG-5283.1.patch
> When a Hadoop ObjectWritable is created during a Spark job, the instantiated 
> PigSplit (wrapped into a SparkPigSplit) is given an empty Configuration 
> instance.
> This happens 
> [here|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SerializableWritable.scala#L44]

This message was sent by Atlassian JIRA

Reply via email to