[
https://issues.apache.org/jira/browse/PIG-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432343#comment-15432343
]
liyunzhang_intel commented on PIG-4970:
---------------------------------------
[~kexianda]:
Yes, in previous code, we set "jobConf.set("pig.cachedbag.type","default")" in
PackageConverter.java in join/group case. But when i search that in mr and tez
code, it does not set this item externally and no unit test fails even when i
remove this item in job configuration. So is there any reason we need set this
item? If we still need that item, we need to add this item to job
configuration in PackageConverter and JoinGroupSparkConverter. Because in
previous way, this item is only set to one spark operator and it is set to the
whole spark plan in current way.
> Remove the deserialize and serialization of JobConf in code for spark mode
> --------------------------------------------------------------------------
>
> Key: PIG-4970
> URL: https://issues.apache.org/jira/browse/PIG-4970
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4970.patch
>
>
> Now we use KryoSerializer to serialize the jobConf in
> [SparkLauncher|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L191].
> then
> deserialize it in
> [ForEachConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java#L83],
>
> [StreamConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java#L70].
> We deserialize and serialize the jobConf in order to make jobConf
> available in spark executor thread.
> We can refactor it in following ways:
> 1. Let spark to broadcast the jobConf in
> [sparkContext.newAPIHadoopRDD|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java#L102].
> Here not create a new jobConf and load properties from PigContext but
> directly use jobConf from SparkLauncher.
> 2. get jobConf in
> [org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark#createRecordReader|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java#L42]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)