[
https://issues.apache.org/jira/browse/CALCITE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068252#comment-17068252
]
neoremind edited comment on CALCITE-3878 at 3/27/20, 5:13 AM:
--------------------------------------------------------------
[~julianhyde] Thanks for looking into the issue. I agree Java intrinsic is not
related in this case. Java intrinsic mainly works by JIT to make code run
faster, like some native JNI interfaces runs faster by inline technique.
I agree extra bytes waste is not a big deal, but I think resizing is a
performance killer. For example,
{code:java}
List<String> list = new ArrayList<>();
for (String s: strs) {
list.add(s);
}{code}
If strs size is less than 10, then it does no harm. If strs size is more than
10, or even worse, like 256, then the _list_ will grow and resize (involves new
array instance creating and copying from old) many times. I test under Java
1.8.0_161-b12, the resizing works as below:
_^int newCapacity = oldCapacity + (oldCapacity >> 1);^_
the size will grow from 10 (initial) to 15,22,33,49,73,109,163,244,366, and
there are 9 times array copying.
If we already know the size of strs, we can eliminate resizing and copying,
good for performance and GC.
It is good to use immutable data structure in guava, but the original list
creation process can be more efficient.
Moreover, in most cases, the list contains object references which are small.
If taking CPU caching like cache line into consideration, the array copying is
also not very friendly. Different capacity of lists are ok, as long as they
reside in memory, and move to CPU in the same manner, no matter how big they
are.
Overall, the performance improvement regarding to this update could be very
small, but I think it is a good way, at least better for performance, even it
is small.
was (Author: neoremind):
[~julianhyde] Thanks for looking into the issue. I agree Java intrinsic is not
related in this case. Java intrinsic mainly works by JIT to make code run
faster, like some native JNI interfaces runs faster by inline technique.
I agree extra bytes waste is not a big deal, but I think resizing is a
performance killer. For example,
{code:java}
List<String> list = new ArrayList<>();
for (String s: strs) {
list.add(s);
}{code}
If strs size is less than 10, then it does no harm. If strs size is more than
10, or even worse, like 256, then the _list_ will grow and resize (involves new
array instance creating and copying from old) many times. I test under Java
1.8.0_161-b12, the resized capacity is
_^int newCapacity = oldCapacity + (oldCapacity >> 1);^_
the size will grow from 10 (intial) to 15,22,33,49,73,109,163,244,366, and
there are 9 times array copying.
If we already know the size of strs, we can eliminate resizing and copying,
good for performance and GC.
It is good to use immutable data structure in guava, but the original list
creation process can be more efficient.
Moreover, in most cases, the list contains object references which are small.
If taking CPU caching like cache line into consideration, the array copying is
also not very friendly. Different capacity of lists are ok, as long as they
reside in memory, and move to CPU in the same manner, no matter how big they
are.
Overall, the performance improvement regarding to this update could be very
small, but I think it is a good way, at least better for performance, even it
is small.
> Make ArrayList creation with initial capacity when size is fixed
> ----------------------------------------------------------------
>
> Key: CALCITE-3878
> URL: https://issues.apache.org/jira/browse/CALCITE-3878
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.22.0
> Reporter: neoremind
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> I find many places in Calcite where _new ArrayList<>()_ is used, if the list
> is expected to be immutable or not resizing, it is always a good manner to
> create with initial capacity, better for memory usage and performance.
> I search all occurrences, focus on the core module, to make it safe, I only
> update local variables with fixed size and not working in recursive method.
> If the local variable reference goes out of scope, if resizing is needed,
> things will work normally as well, so no side effect, but for the "escaping"
> case, I am very conservative and do not change them.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)