[
https://issues.apache.org/jira/browse/PIG-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118306#comment-16118306
]
Adam Szita commented on PIG-5283:
---------------------------------
Attached [^PIG-5283.1.patch] with the feature of only writing out the necessary
keys of the configuration.
Unfortunately I don't see any way to write the config only once (instead of per
split), as I need to have it ready at the very first stages of Spark task
execution: the deseralization of the task.
[~kellyzly] yes, the PigInputFormatSpark#createRecordReader part comes much
later in the execution, and will work just like before. In a way it is
irrelevant of the current issue, because it will set the full configuration on
each split, but it's too late for this issue since we need the configuration
during task deseralization time already.
> Configuration is not passed to SparkPigSplits on the backend
> ------------------------------------------------------------
>
> Key: PIG-5283
> URL: https://issues.apache.org/jira/browse/PIG-5283
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Adam Szita
> Assignee: Adam Szita
> Attachments: PIG-5283.0.patch, PIG-5283.1.patch
>
>
> When a Hadoop ObjectWritable is created during a Spark job, the instantiated
> PigSplit (wrapped into a SparkPigSplit) is given an empty Configuration
> instance.
> This happens
> [here|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SerializableWritable.scala#L44]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)