[ 
https://issues.apache.org/jira/browse/PIG-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1932:
----------------------------

    Attachment: PIG-1932.patch

Unit tests pass.  Results of test-patch:

     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or 
modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning 
messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
     [exec]
     [exec]     -1 release audit.  The applied patch generated 545 release 
audit warnings (more than the trunk's current 544 warnings).
     [exec]

the new release audit warning is because I added a file.

> GFCross should allow the user to set the DEFAULT_PARALLELISM value
> ------------------------------------------------------------------
>
>                 Key: PIG-1932
>                 URL: https://issues.apache.org/jira/browse/PIG-1932
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Alan Gates
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: PIG-1932.patch
>
>
> The internal UDF GFCross uses a final static int DEFAULT_PARALLELISM to 
> determine how wide to spread the records in a cross.  It is currently hard 
> wired to 96.  There are no comments in the code on how that value was settled 
> on.  Despite the name, this value is not necessarily related to the reduce 
> parallelism controlled by the parallel clause.  It controls how many 
> artificial join key values are generated and how many times each record is 
> duplicated before going through the join.  The higher it is set the more key 
> values (and thus the less likely the cross will run out of memory) but also 
> the more times each record is duplicated in the map phase before being sent 
> to the reduce.  
> We should leave the default value at 96 but allow a property to override this 
> default and change the value.
> We cannot use a constructor argument here because the use of the UDF is not 
> exposed to the user, so he has no opportunity to pass a constructor argument 
> to it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to