[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bright chen updated APEXMALHAR-2197:
------------------------------------
    Description: 
When there are lots of keys and large throughput, the following exception will 
be throw.
java.util.NoSuchElementException
        at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1113)
        at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
        at 
org.apache.apex.malhar.lib.state.spillable.TimeBasedPriorityQueue.removeLRU(TimeBasedPriorityQueue.java:75)
        at 
org.apache.apex.malhar.lib.state.spillable.WindowBoundedMapCache.endWindow(WindowBoundedMapCache.java:119)
        at 
org.apache.apex.malhar.lib.state.spillable.SpillableByteArrayListMultimapImpl.endWindow(SpillableByteArrayListMultimapImpl.java:281)
        at 
com.datatorrent.benchmark.spillable.SpillableDSBenchmarkTest.testSpillableMutimap(SpillableDSBenchmarkTest.java:139)


The reason is as following:
In TimeBasedPriorityQueue.TimeWrapper<T>, when implement compareTo(), only 
compare by time. While in equals(), compare both time and key.

Generally, "compareTo() == 0” means “equals() == true”. But in 
TimeBasedPriorityQueue, compareTo() was used by “sortedTimestamp” and it would 
be OK only compare by time.
But it would cause problem in removeLRU() if lots of key written in same time. 
In this case, one time(even different key) is one entry of sortedTimestamp, but 
the counter and count was calculated by key.
So the counter could be much larger than the number of entry of 
sortedTimestamp. And NoSuchElementException would throw when reach the end of 
the iterator.


Solutions:
Add key as the less important factor could solve this problem. But probably 
need to limit key to support compareTo()

Another idea I think also work is use two maps, the benefit is don’t need to 
compare by key( only compare by time)
-  timeToKeys: multiple list map from time to keys
- keyToRecentTime: key to most recent time
implementation:
- upSert(T key): get time from keyToRecentTime map; if don’t have time, put 
(key, time) to both keyToRecentTime and  timeToKeys; if have time and time is 
different, remove key from timeToKeys, update keyToRecentTime
- removeLRU(int count): get time sortedTimestamp, and then get and remove keys 
from timeToKeys, and remove entry from keyToRecentTime.

Another quick solution is also add key as the less important factor of 
compareTo(), but instead of add a limitation to the key( key should 
comparable), use equals() to compare the key( we care about equality of key, 
but in fact don't care the order of the key). 


  was:
When there are lots of keys and large throughput, the following exception will 
be throw.
java.util.NoSuchElementException
        at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1113)
        at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
        at 
org.apache.apex.malhar.lib.state.spillable.TimeBasedPriorityQueue.removeLRU(TimeBasedPriorityQueue.java:75)
        at 
org.apache.apex.malhar.lib.state.spillable.WindowBoundedMapCache.endWindow(WindowBoundedMapCache.java:119)
        at 
org.apache.apex.malhar.lib.state.spillable.SpillableByteArrayListMultimapImpl.endWindow(SpillableByteArrayListMultimapImpl.java:281)
        at 
com.datatorrent.benchmark.spillable.SpillableDSBenchmarkTest.testSpillableMutimap(SpillableDSBenchmarkTest.java:139)


The reason is as following:
In TimeBasedPriorityQueue.TimeWrapper<T>, when implement compareTo(), only 
compare by time. While in equals(), compare both time and key.

Generally, "compareTo() == 0” means “equals() == true”. But in 
TimeBasedPriorityQueue, compareTo() was used by “sortedTimestamp” and it would 
be OK only compare by time.
But it would cause problem in removeLRU() if lots of key written in same time. 
In this case, one time(even different key) is one entry of sortedTimestamp, but 
the counter and count was calculated by key.
So the counter could be much larger than the number of entry of 
sortedTimestamp. And NoSuchElementException would throw when reach the end of 
the iterator.


Solutions:
Add key as the less important factor could solve this problem. But probably 
need to limit key to support compareTo()

Another idea I think also work is use two maps, the benefit is don’t need to 
compare by key( only compare by time)
-  timeToKeys: multiple list map from time to keys
- keyToRecentTime: key to most recent time
implementation:
- upSert(T key): get time from keyToRecentTime map; if don’t have time, put 
(key, time) to both keyToRecentTime and  timeToKeys; if have time and time is 
different, remove key from timeToKeys, update keyToRecentTime
- removeLRU(int count): get time sortedTimestamp, and then get and remove keys 
from timeToKeys, and remove entry from keyToRecentTime.



> TimeBasedPriorityQueue.removeLRU throws NoSuchElementException
> --------------------------------------------------------------
>
>                 Key: APEXMALHAR-2197
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2197
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: bright chen
>            Assignee: bright chen
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> When there are lots of keys and large throughput, the following exception 
> will be throw.
> java.util.NoSuchElementException
>       at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1113)
>       at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
>       at 
> org.apache.apex.malhar.lib.state.spillable.TimeBasedPriorityQueue.removeLRU(TimeBasedPriorityQueue.java:75)
>       at 
> org.apache.apex.malhar.lib.state.spillable.WindowBoundedMapCache.endWindow(WindowBoundedMapCache.java:119)
>       at 
> org.apache.apex.malhar.lib.state.spillable.SpillableByteArrayListMultimapImpl.endWindow(SpillableByteArrayListMultimapImpl.java:281)
>       at 
> com.datatorrent.benchmark.spillable.SpillableDSBenchmarkTest.testSpillableMutimap(SpillableDSBenchmarkTest.java:139)
> The reason is as following:
> In TimeBasedPriorityQueue.TimeWrapper<T>, when implement compareTo(), only 
> compare by time. While in equals(), compare both time and key.
> Generally, "compareTo() == 0” means “equals() == true”. But in 
> TimeBasedPriorityQueue, compareTo() was used by “sortedTimestamp” and it 
> would be OK only compare by time.
> But it would cause problem in removeLRU() if lots of key written in same 
> time. In this case, one time(even different key) is one entry of 
> sortedTimestamp, but the counter and count was calculated by key.
> So the counter could be much larger than the number of entry of 
> sortedTimestamp. And NoSuchElementException would throw when reach the end of 
> the iterator.
> Solutions:
> Add key as the less important factor could solve this problem. But probably 
> need to limit key to support compareTo()
> Another idea I think also work is use two maps, the benefit is don’t need to 
> compare by key( only compare by time)
> -  timeToKeys: multiple list map from time to keys
> - keyToRecentTime: key to most recent time
> implementation:
> - upSert(T key): get time from keyToRecentTime map; if don’t have time, put 
> (key, time) to both keyToRecentTime and  timeToKeys; if have time and time is 
> different, remove key from timeToKeys, update keyToRecentTime
> - removeLRU(int count): get time sortedTimestamp, and then get and remove 
> keys from timeToKeys, and remove entry from keyToRecentTime.
> Another quick solution is also add key as the less important factor of 
> compareTo(), but instead of add a limitation to the key( key should 
> comparable), use equals() to compare the key( we care about equality of key, 
> but in fact don't care the order of the key). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to