[ 
https://issues.apache.org/jira/browse/PIG-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734130#comment-13734130
 ] 

Suhas Satish commented on PIG-3409:
-----------------------------------

Whats your suggested code fix to precomputing the hash?
                
> org.apache.pig.data.DefaultTuple hashcode perfomance issue
> ----------------------------------------------------------
>
>                 Key: PIG-3409
>                 URL: https://issues.apache.org/jira/browse/PIG-3409
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.11
>            Reporter: Sergey
>            Priority: Critical
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> I've met serious perfomance issue.
> please see visualvm screenshot.
> Here is hashCode implementation from the class:
> {code}
>  @Override
>     public int hashCode() {
>         int hash = 17;
>         for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
>             Object o = it.next();
>             if (o != null) {
>                 hash = 31 * hash + o.hashCode();
>             }
>         }
>         return hash;
>     }
> {code}
> I don't see any reason here to iterate over the whole tuple, aggregate hash 
> value and then return it.
> I can fix it, if it's possible to take part in dev process. I'm new to it :(
> The idea for any join:
> If we have a plan we know for sure which relations would be joined.
> It means that we can precalculate hashcode values.
> The difference is: m+n hashcode calculations or m*n (current implementation).
> It think it should bring significant perfomance boost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to