[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011428#comment-16011428 ]
ASF GitHub Bot commented on FLINK-6589: --------------------------------------- GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/3912 [FLINK-6589] [core] Deserialize ArrayList with capacity of size+1 to prevent growth. Several Table API / SQL operators hold records in a `MapState[Long, List[X]]` keyed on a timestamp. When a new record arrives, the corresponding list is fetched and the record is added to the list. Currently, the `ListSerializer` deserializes lists as `ArrayList` with capacity exactly equal to the number of serialized elements. Hence, the `ArrayList` will grow when a new element is added which is an expensive operation. This PR changes the capacity of the deserialized `ArrayList` to #elements + 1 to avoid the growing the list when a single element is added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink listSer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3912.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3912 ---- commit 1fa86b4e282ab0594d8dc768840de4c798e29591 Author: Fabian Hueske <fhue...@apache.org> Date: 2017-05-15T19:41:51Z [FLINK-6589] [core] Deserialize ArrayList with capacity of size+1 to prevent growth. ---- > ListSerializer should deserialize as ArrayList with size + 1 > ------------------------------------------------------------ > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core > Affects Versions: 1.3.0, 1.4.0 > Reporter: Fabian Hueske > Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState<Long, List<X>>}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)