[ 
https://issues.apache.org/jira/browse/CALCITE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068252#comment-17068252
 ] 

neoremind edited comment on CALCITE-3878 at 3/27/20, 4:24 AM:
--------------------------------------------------------------

[~julianhyde] Thanks for looking into the issue. I agree Java intrinsic is not 
related in this case. Java intrinsic mainly works by JIT to make code run 
faster, like some native JNI interfaces runs faster by inline technique. 

I agree extra bytes waste is not a big deal, but I think resizing is a 
performance killer. For example,
{code:java}
List<String> list = new ArrayList<>();
for (String s: strs) {
  list.add(s);
}{code}
If strs size is less than 10, then it does no harm. If strs size is more than 
10, or even worse, like 256, then the _list_ will grow and resize (involves new 
array instance creating and copying from old) many times. I test under Java 
1.8.0_161-b12, the resized capacity is

_^int newCapacity = oldCapacity + (oldCapacity >> 1);^_

the size will grow like that 15,22,33,49,73,109,163,244 to 366, and there are 9 
times array copying.

If we already know the size of strs, we can eliminate resizing and copying, 
good for performance and GC.

It is good to use immutable data structure in guava, but the original list 
creation process can be more efficient.

Moreover, in most cases, the list contains object references which are small. 
If taking CPU caching like cache line into consideration, the array copying is 
also not very friendly. Different capacity of lists is ok, as long as they 
reside in memory, and move to CPU in the same manner, no matter how big they 
are.

Overall, the performance improvement regarding to this update could be very 
small, but I think it is a good way, at least better for performance, even it 
is small.

 

 


was (Author: neoremind):
[~julianhyde] Thanks for looking into the issue. I agree Java intrinsic is not 
related in this case. Java intrinsic mainly works by JIT to make code run 
faster, like some native JNI interfaces runs faster by inline technique. 

I agree extra bytes waste is not a big deal, but I think resizing is a 
performance killer. For example,
{code:java}
List<String> list = new ArrayList<>();
for (String s: strs) {
  list.add(s);
}{code}
If strs size is less than 10, then it does no harm. If strs size is more than 
10, or even worse, like 256, then the _list_ will grow and resize (involves new 
array instance creating and copying from old) many times. I test under Java 
1.8.0_161-b12, the resized capacity is

_^int newCapacity = oldCapacity + (oldCapacity >> 1);^_

the size will grow like that 15,22,33,49,73,109,163,244 to 366, and there are 9 
times array copying.

If we already know the size of strs, we can eliminate resizing and copying, 
good for performance and GC.

It is good to use immutable data structure in guava, but the original list 
creation process can be more efficient.

Moreover, in most cases, the list contains object references which are small. 
If taking CPU caching like cache line into consideration, the array copying is 
also not very friendly.

Overall, the performance improvement regarding to this update could be very 
small, but I think it is a good way, at least better for performance, even it 
is small.

 

 

> Make ArrayList creation with initial capacity when size is fixed
> ----------------------------------------------------------------
>
>                 Key: CALCITE-3878
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3878
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.22.0
>            Reporter: neoremind
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> I find many places in Calcite where _new ArrayList<>()_ is used, if the list 
> is expected to be immutable or not resizing, it is always a good manner to 
> create with initial capacity, better for memory usage and performance.
> I search all occurrences, focus on the core module, to make it safe, I only 
> update local variables with fixed size and not working in recursive method. 
> If the local variable reference goes out of scope, if resizing is needed, 
> things will work normally as well, so no side effect, but for the "escaping" 
> case, I am very conservative and do not change them.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to