[
https://issues.apache.org/jira/browse/PIG-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012579#comment-13012579
]
Daniel Dai commented on PIG-1932:
---------------------------------
+1
> GFCross should allow the user to set the DEFAULT_PARALLELISM value
> ------------------------------------------------------------------
>
> Key: PIG-1932
> URL: https://issues.apache.org/jira/browse/PIG-1932
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.8.0
> Reporter: Alan Gates
> Assignee: Alan Gates
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-1932.patch, PIG-1932_2.patch
>
>
> The internal UDF GFCross uses a final static int DEFAULT_PARALLELISM to
> determine how wide to spread the records in a cross. It is currently hard
> wired to 96. There are no comments in the code on how that value was settled
> on. Despite the name, this value is not necessarily related to the reduce
> parallelism controlled by the parallel clause. It controls how many
> artificial join key values are generated and how many times each record is
> duplicated before going through the join. The higher it is set the more key
> values (and thus the less likely the cross will run out of memory) but also
> the more times each record is duplicated in the map phase before being sent
> to the reduce.
> We should leave the default value at 96 but allow a property to override this
> default and change the value.
> We cannot use a constructor argument here because the use of the UDF is not
> exposed to the user, so he has no opportunity to pass a constructor argument
> to it.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira