[
https://issues.apache.org/jira/browse/APEXMALHAR-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Yan resolved APEXMALHAR-2197.
-----------------------------------
Resolution: Fixed
Fix Version/s: 3.5.0
> TimeBasedPriorityQueue.removeLRU throws NoSuchElementException
> --------------------------------------------------------------
>
> Key: APEXMALHAR-2197
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2197
> Project: Apache Apex Malhar
> Issue Type: Bug
> Reporter: bright chen
> Assignee: bright chen
> Fix For: 3.5.0
>
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> When there are lots of keys and large throughput, the following exception
> will be throw.
> java.util.NoSuchElementException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1113)
> at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
> at
> org.apache.apex.malhar.lib.state.spillable.TimeBasedPriorityQueue.removeLRU(TimeBasedPriorityQueue.java:75)
> at
> org.apache.apex.malhar.lib.state.spillable.WindowBoundedMapCache.endWindow(WindowBoundedMapCache.java:119)
> at
> org.apache.apex.malhar.lib.state.spillable.SpillableByteArrayListMultimapImpl.endWindow(SpillableByteArrayListMultimapImpl.java:281)
> at
> com.datatorrent.benchmark.spillable.SpillableDSBenchmarkTest.testSpillableMutimap(SpillableDSBenchmarkTest.java:139)
> The reason is as following:
> In TimeBasedPriorityQueue.TimeWrapper<T>, when implement compareTo(), only
> compare by time. While in equals(), compare both time and key.
> Generally, "compareTo() == 0” means “equals() == true”. But in
> TimeBasedPriorityQueue, compareTo() was used by “sortedTimestamp” and it
> would be OK only compare by time.
> But it would cause problem in removeLRU() if lots of key written in same
> time. In this case, one time(even different key) is one entry of
> sortedTimestamp, but the counter and count was calculated by key.
> So the counter could be much larger than the number of entry of
> sortedTimestamp. And NoSuchElementException would throw when reach the end of
> the iterator.
> Solutions:
> Add key as the less important factor could solve this problem. But probably
> need to limit key to support compareTo()
> Another idea I think also work is use two maps, the benefit is don’t need to
> compare by key( only compare by time)
> - timeToKeys: multiple list map from time to keys
> - keyToRecentTime: key to most recent time
> implementation:
> - upSert(T key): get time from keyToRecentTime map; if don’t have time, put
> (key, time) to both keyToRecentTime and timeToKeys; if have time and time is
> different, remove key from timeToKeys, update keyToRecentTime
> - removeLRU(int count): get time sortedTimestamp, and then get and remove
> keys from timeToKeys, and remove entry from keyToRecentTime.
> Another quick solution is also add key as the less important factor of
> compareTo(), but instead of add a limitation to the key( key should
> comparable), use equals() to compare the key( we care about equality of key,
> but in fact don't care the order of the key).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)