[
https://issues.apache.org/jira/browse/PIG-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-1932:
----------------------------
Attachment: PIG-1932.patch
Unit tests pass. Results of test-patch:
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or
modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning
messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number
of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs
warnings.
[exec]
[exec] -1 release audit. The applied patch generated 545 release
audit warnings (more than the trunk's current 544 warnings).
[exec]
the new release audit warning is because I added a file.
> GFCross should allow the user to set the DEFAULT_PARALLELISM value
> ------------------------------------------------------------------
>
> Key: PIG-1932
> URL: https://issues.apache.org/jira/browse/PIG-1932
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.8.0
> Reporter: Alan Gates
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-1932.patch
>
>
> The internal UDF GFCross uses a final static int DEFAULT_PARALLELISM to
> determine how wide to spread the records in a cross. It is currently hard
> wired to 96. There are no comments in the code on how that value was settled
> on. Despite the name, this value is not necessarily related to the reduce
> parallelism controlled by the parallel clause. It controls how many
> artificial join key values are generated and how many times each record is
> duplicated before going through the join. The higher it is set the more key
> values (and thus the less likely the cross will run out of memory) but also
> the more times each record is duplicated in the map phase before being sent
> to the reduce.
> We should leave the default value at 96 but allow a property to override this
> default and change the value.
> We cannot use a constructor argument here because the use of the UDF is not
> exposed to the user, so he has no opportunity to pass a constructor argument
> to it.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira