[
https://issues.apache.org/jira/browse/PIG-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey updated PIG-3409:
------------------------
Description:
I've met serious perfomance issue.
please see visualvm screenshot.
Here is hashCode implementation from the class:
{code}
@Override
public int hashCode() {
int hash = 17;
for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
Object o = it.next();
if (o != null) {
hash = 31 * hash + o.hashCode();
}
}
return hash;
}
{code}
I don't see any reason here to iterate over the whole tuple, aggregate hash
value and then return it.
I can fix it, if it's possible to take part in dev process. I'm new to it :(
The idea for any join:
If we have a plan we know for sure which relations would be joined.
It means that we can precalculate hashcode values.
The difference is: m+n hashcode calculations or m*n (current implementation).
It think it should bring significant perfomance boost.
was:
I've met serious perfomance issue.
please see visualvm screenshot.
Here is hashCode implementation from the class:
{code}
@Override
public int hashCode() {
int hash = 17;
for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
Object o = it.next();
if (o != null) {
hash = 31 * hash + o.hashCode();
}
}
return hash;
}
{code}
I don't see any reason here to iterate over the whole tuple, aggregate hash
value and then return it.
I can fix it, if it's possible to take part in dev process. I'm new to it :(
> org.apache.pig.data.DefaultTuple hashcode perfomance issue
> ----------------------------------------------------------
>
> Key: PIG-3409
> URL: https://issues.apache.org/jira/browse/PIG-3409
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.11
> Reporter: Sergey
> Priority: Critical
> Original Estimate: 3h
> Remaining Estimate: 3h
>
> I've met serious perfomance issue.
> please see visualvm screenshot.
> Here is hashCode implementation from the class:
> {code}
> @Override
> public int hashCode() {
> int hash = 17;
> for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
> Object o = it.next();
> if (o != null) {
> hash = 31 * hash + o.hashCode();
> }
> }
> return hash;
> }
> {code}
> I don't see any reason here to iterate over the whole tuple, aggregate hash
> value and then return it.
> I can fix it, if it's possible to take part in dev process. I'm new to it :(
> The idea for any join:
> If we have a plan we know for sure which relations would be joined.
> It means that we can precalculate hashcode values.
> The difference is: m+n hashcode calculations or m*n (current implementation).
> It think it should bring significant perfomance boost.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira