Lewis John McGibbney created GORA-225:
-----------------------------------------
Summary: Various Issues with MemStore
Key: GORA-225
URL: https://issues.apache.org/jira/browse/GORA-225
Project: Apache Gora
Issue Type: Bug
Components: gora-core, testing
Affects Versions: 0.3
Environment: Nutch 2.x HEAD, gora-core 0.3
Reporter: Lewis John McGibbney
Fix For: 0.4
In Nutch we have numerous testing scenarios which simulate persistence of data
to Gora in some form or other. It has worked good as until now.
Now that gora-sql-0.1.1-incubating artifact is non-compatible with gora-core
0.3, there is a requirement to address this situation in order to keep some
degree of integrity within the Nutch codebase.
Specifcally a number of tests [0][1][2][3] all extend a Util testing class
which utilizes functionality from the gora-sql artifact.
My initial solution was to switch to using MemStore... which brought me to
logging this issue!
Test [0] fails with the following useless logging... I need to DEBUG this much
more throughly
{code}
Testcase: testGenerateHighest took 1.845 sec
FAILED
expected:<2> but was:<0>
junit.framework.AssertionFailedError: expected:<2> but was:<0>
at
org.apache.nutch.crawl.TestGenerator.testGenerateHighest(TestGenerator.java:78)
Testcase: testGenerateHostLimit took 1.207 sec
FAILED
expected:<1> but was:<0>
junit.framework.AssertionFailedError: expected:<1> but was:<0>
at
org.apache.nutch.crawl.TestGenerator.testGenerateHostLimit(TestGenerator.java:134)
Testcase: testGenerateDomainLimit took 1.175 sec
FAILED
expected:<1> but was:<0>
junit.framework.AssertionFailedError: expected:<1> but was:<0>
at
org.apache.nutch.crawl.TestGenerator.testGenerateDomainLimit(TestGenerator.java:185)
Testcase: testFilter took 2.31 sec
FAILED
expected:<3> but was:<0>
junit.framework.AssertionFailedError: expected:<3> but was:<0>
at
org.apache.nutch.crawl.TestGenerator.testFilter(TestGenerator.java:239)
{code}
Tests [1][2] are fail identically with the following stack trace
{code}
Testcase: testInject took 1.931 sec
Caused an ERROR
null
java.util.NoSuchElementException
at java.util.TreeMap.key(TreeMap.java:1221)
at java.util.TreeMap.firstKey(TreeMap.java:285)
at org.apache.gora.memory.store.MemStore.execute(MemStore.java:122)
at
org.apache.nutch.util.CrawlTestUtil.readContents(CrawlTestUtil.java:112)
at org.apache.nutch.crawl.TestInjector.readDb(TestInjector.java:104)
at org.apache.nutch.crawl.TestInjector.testInject(TestInjector.java:62)
{code}
Finally, a multithreaded test in [3] fails with the following
{code}
java.util.ConcurrentModificationException
at
java.util.TreeMap$NavigableSubMap$SubMapIterator.nextEntry(TreeMap.java:1594)
at
java.util.TreeMap$NavigableSubMap$SubMapKeyIterator.next(TreeMap.java:1655)
at
org.apache.gora.memory.store.MemStore$MemResult.nextInner(MemStore.java:81)
at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:112)
at
org.apache.nutch.storage.TestGoraStorage.readWrite(TestGoraStorage.java:74)
at
org.apache.nutch.storage.TestGoraStorage.access$100(TestGoraStorage.java:41)
at
org.apache.nutch.storage.TestGoraStorage$1.call(TestGoraStorage.java:107)
at
org.apache.nutch.storage.TestGoraStorage$1.call(TestGoraStorage.java:102)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
{code}
I believe that the final failure is due to to the use of TreeMap [5] as a
private object in MemStore. TreeMap implementations are not synchronized. If
multiple threads access a map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or more
mappings; merely changing the value associated with an existing key is not a
structural modification.) This is typically accomplished by synchronizing on
some object that naturally encapsulates the map. If no such object exists, the
map should be "wrapped" using the Collections.synchronizedSortedMap method.
This is best done at creation time, to prevent accidental unsynchronized access
to the map e.g.
SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));
N.B. The NOTE on TreeMap's come right from the Oracle JavaDoc.
[0]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestGenerator.java?view=markup
[1]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestInjector.java?view=markup
[2]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/fetcher/TestFetcher.java?view=markup
[3]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/storage/TestGoraStorage.java?view=markup
[4]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/util/AbstractNutchTest.java?view=markup
[5] http://docs.oracle.com/javase/6/docs/api/java/util/TreeMap.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira