Re: [classlib][luni][performance] Improvements in Collections

Nathan Beyer Fri, 18 Apr 2008 12:15:24 -0700

If all you're talking about is the capacity calculation then, yes. If you're
talking about the entire patch, then no.


I think the capacity calculation should be removed from the patch. It's NOT
the performance boost, correct?

On Fri, Apr 18, 2008 at 2:10 PM, Aleksey Shipilev <
[EMAIL PROTECTED]> wrote:

> Ok, Nathan, let's stop for a while.
> Am I right that the problem is different-style implementation of
> basically the same thing in all three HashMaps? Am I right that we
> haven't any disagreements on the performance impact of such the
> optimization?
>
> I can elaborate on these issues if there are still misunderstandings :)
>
> Thanks,
> Aleksey.
>
> On Fri, Apr 18, 2008 at 11:06 PM, Nathan Beyer <[EMAIL 
> PROTECTED]<https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>>
> wrote:
> > On Fri, Apr 18, 2008 at 1:53 PM, Sergey Salishev <
> >
> > [EMAIL PROTECTED]<https://mail.google.com/mail?view=cm&tf=0&[EMAIL 
> > PROTECTED]>>
> wrote:
> >
> >
> > > Nathan,
> >  >
> >  > The "load factor" does't impact the capacity growth scale. It only
> governs
> >  > the persentage of the free space reserved inside the current
> capacity. So
> >  > if
> >  > the initial capacity is X the capacity growth scale would be 2X 4X
> 8X...
> >  > with all load factor values.
> >
> >
> >  Correct, it determines when the capacity should increase.
> >
> >
> >
> >  >
> >  >
> >  > The changes proposed by Alexey for WeakHashMap are already applied to
> >  > HashMap almost a year ago.
> >
> >
> >  That is not what I understand. The majority of the patch has nothing to
> do
> >  with the capacity calculation. Additionally, it's not the same; the
> method
> >  names are different, the algorithm is slightly different, etc. If this
> >  capacity calculation is good for all three HashMaps, then put the exact
> same
> >  code in all three HashMaps. I'll vote for consistency and clarity
> >  (java.util.HashMap's 'calculateCapacity' method is much clearer).
> >
> >  -Nathan
> >
> >
> >  >
> >  >
> >  > Thanks.
> >  > Sergey.
> >
> > >
> >  > On Fri, Apr 18, 2008 at 10:41 PM, Nathan Beyer <[EMAIL 
> > PROTECTED]<https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> <https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>>
> >  > wrote:
> >  >
> >  > > On Fri, Apr 18, 2008 at 12:57 PM, Sergey Salishev <
> >
> > > > [EMAIL PROTECTED]<https://mail.google.com/mail?view=cm&tf=0&[EMAIL 
> > > > PROTECTED]>
> <https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]
> >>
> >  > wrote:
> >  > >
> >  > > > Nathan
> >  > > >
> >
> >
> > > > > I agree with  you. This can be a problem. In my practice the
> biggest
> >  > > > problems were with 1-2 sized hash maps taking 16 size arrays.
> Probably
> >  > > the
> >  > > > best place to implement % optimization would bi CPU:)
> >  > > > We can try the different approach. The HashMap growth is governed
> only
> >  > > by
> >  > > > initial capacity so in the HashMap constructor one can say is 2^k
> or
> >  > > not.
> >  > >
> >  > >
> >  > > What about 'load factor'? That governs the growth as well.
> >  > >
> >  > > -Nathan
> >  > >
> >  > >
> >  > > >
> >  > > >
> >  > > > We can write something like
> >  > > >
> >  > > > <code>
> >  > > > final boolean powOf2 = isPowOf2(capacity);
> >  > > >
> >  > > > final int mod(int index, int length) {
> >  > > >     return powOf2? index & (length - 1) : index % length;
> >  > > > }
> >  > > > <code>
> >  > > >
> >  > > > In theory the common path for the workload should be specialized
> by
> >  > JIT.
> >  > > >
> >  > > > Ooops. When looking into HashMap code I've noticed someone
> already did
> >  > > > this.
> >  > > > https://issues.apache.org/jira/browse/HARMONY-4064
> >  > > > HashMap effectively does the rounding to the neares 2^k.
> >  > > >
> >  > > > Thanks.
> >  > > > Sergey.
> >  > > > On Fri, Apr 18, 2008 at 8:55 PM, Nathan Beyer <[EMAIL 
> > PROTECTED]<https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> <https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> >  > <
> >  > > https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>>
> >  > > > wrote:
> >  > > >
> >  > > > > I know that people will notice; I have personal experience with
> >  > > systems
> >  > > > > that
> >  > > > > included custom Map implementations were written because
> HashMaps
> >  > grew
> >  > > > too
> >  > > > > large for small data sets (less than 2000, actually) and wasted
> a
> >  > lot
> >  > > of
> >  > > > > memory for the unnecessary capacity. Even the use of the
> capacity
> >  > and
> >  > > > load
> >  > > > > factor didn't provide enough compensation in these cases.
> >  > > > >
> >  > > > > -Nathan
> >  > > > >
> >  > > > > On Fri, Apr 18, 2008 at 11:46 AM, Sergey Salishev <
> >  > > > > [EMAIL PROTECTED]<https://mail.google.com/mail?view=cm&tf=0&[EMAIL 
> > PROTECTED]>
> <https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> >  > <
> >  > >
> https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]
> >  > >>
> >  > >  > wrote:
> >  > > > >
> >  > > > > > Nathan,
> >  > > > > >
> >  > > > > > I don't think anyone will notice the hash map size rounding.
> It
> >  > can
> >  > > > lead
> >  > > > > > to
> >  > > > > > some memory overhead in very rare cases the user creates hash
> map
> >  > > with
> >  > > > > the
> >  > > > > > exact size. But in the most common case where the map is
> created
> >  > > with
> >  > > > > > default size the rounding will not change the behavior at all
> as
> >  > > it's
> >  > > > in
> >  > > > > > agreement with the standard 2x growth policy. On the other
> hand
> >  > size
> >  > > > > > rounding gives substantial performance boost on all gets.
> >  > > > > >
> >  > > > > > Thanks.
> >  > > > > > Sergey.
> >  > > > > >
> >  > > > > >
> >  > > > > > On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <
> [EMAIL PROTECTED]<https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> <https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> >  > <
> >  > > https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> >  > > > <
> >  > > > > https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]
> >>
> >  > > > > > wrote:
> >  > > > > >
> >  > > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
> >  > > > > > >
> >  > > > > > > Again, I don't agree with the capacity rounding in the
> patch
> >  > > > attached
> >  > > > > to
> >  > > > > > > this issue. I do like the change to the internal data
> structure;
> >  > > use
> >  > > > > two
> >  > > > > > > arrays for key/value instead a single array. It makes the
> code
> >  > > > easier
> >  > > > > to
> >  > > > > > > read.
> >  > > > > > >
> >  > > > > > > -Nathan
> >  > > > > > >
> >  > > > > > > On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
> >  > > > > > > [EMAIL 
> > PROTECTED]<https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> <https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> >  > <
> >  > >
> https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]>
> >  > > > <
> >  > > > >
> >  > >
> https://mail.google.com/mail?view=cm&tf=0&[EMAIL PROTECTED]
> >  > > > >>
> >  > > > >  > wrote:
> >  > > > > > >
> >  > > > > > >  > Colleagues,
> >  > > > > > > >
> >  > > > > > > > I had recently filed two JIRAs with improvements in
> >  > Collections,
> >  > > > > > > > giving up to +30-40% to serialization benchmarks.
> Presumably
> >  > > they
> >  > > > > will
> >  > > > > > > > boost the performance across the all users since the
> >  > > optimization
> >  > > > is
> >  > > > > > > > pretty general:
> >  > > > > > > > https://issues.apache.org/jira/browse/HARMONY-5761
> >  > > > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
> >  > > > > > > >
> >  > > > > > > > Would some classlib guru (Tim, Nathan, Tony?) review and
> >  > commit
> >  > > > > them?
> >  > > > > > > >
> >  > > > > > > > Thanks,
> >  > > > > > > > Aleksey.
> >  > > > > > > >
> >  > > > > > >
> >  > > > > >
> >  > > > >
> >  > > >
> >  > >
> >  >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Reply via email to