[
https://issues.apache.org/jira/browse/GORA-225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated GORA-225:
--------------------------------------
Description:
In Nutch we have numerous testing scenarios which simulate persistence of data
to Gora in some form or other. It has worked good as until now.
Now that gora-sql-0.1.1-incubating artifact is non-compatible with gora-core
0.3, there is a requirement to address this situation in order to keep some
degree of integrity within the Nutch codebase.
Specifcally a number of tests [0][1][2][3] all extend a Util testing class
which utilizes functionality from the gora-sql artifact.
My initial solution was to switch to using MemStore... which brought me to
logging this issue!
I've logged sub issues here to make clear distinction about my observations.
was:
In Nutch we have numerous testing scenarios which simulate persistence of data
to Gora in some form or other. It has worked good as until now.
Now that gora-sql-0.1.1-incubating artifact is non-compatible with gora-core
0.3, there is a requirement to address this situation in order to keep some
degree of integrity within the Nutch codebase.
Specifcally a number of tests [0][1][2][3] all extend a Util testing class
which utilizes functionality from the gora-sql artifact.
My initial solution was to switch to using MemStore... which brought me to
logging this issue!
Test [0] fails with the following useless logging... I need to DEBUG this much
more throughly
{code}
Testcase: testGenerateHighest took 1.845 sec
FAILED
expected:<2> but was:<0>
junit.framework.AssertionFailedError: expected:<2> but was:<0>
at
org.apache.nutch.crawl.TestGenerator.testGenerateHighest(TestGenerator.java:78)
Testcase: testGenerateHostLimit took 1.207 sec
FAILED
expected:<1> but was:<0>
junit.framework.AssertionFailedError: expected:<1> but was:<0>
at
org.apache.nutch.crawl.TestGenerator.testGenerateHostLimit(TestGenerator.java:134)
Testcase: testGenerateDomainLimit took 1.175 sec
FAILED
expected:<1> but was:<0>
junit.framework.AssertionFailedError: expected:<1> but was:<0>
at
org.apache.nutch.crawl.TestGenerator.testGenerateDomainLimit(TestGenerator.java:185)
Testcase: testFilter took 2.31 sec
FAILED
expected:<3> but was:<0>
junit.framework.AssertionFailedError: expected:<3> but was:<0>
at
org.apache.nutch.crawl.TestGenerator.testFilter(TestGenerator.java:239)
{code}
Tests [1][2] are fail identically with the following stack trace
{code}
Testcase: testInject took 1.931 sec
Caused an ERROR
null
java.util.NoSuchElementException
at java.util.TreeMap.key(TreeMap.java:1221)
at java.util.TreeMap.firstKey(TreeMap.java:285)
at org.apache.gora.memory.store.MemStore.execute(MemStore.java:122)
at
org.apache.nutch.util.CrawlTestUtil.readContents(CrawlTestUtil.java:112)
at org.apache.nutch.crawl.TestInjector.readDb(TestInjector.java:104)
at org.apache.nutch.crawl.TestInjector.testInject(TestInjector.java:62)
{code}
Finally, a multithreaded test in [3] fails with the following
{code}
java.util.ConcurrentModificationException
at
java.util.TreeMap$NavigableSubMap$SubMapIterator.nextEntry(TreeMap.java:1594)
at
java.util.TreeMap$NavigableSubMap$SubMapKeyIterator.next(TreeMap.java:1655)
at
org.apache.gora.memory.store.MemStore$MemResult.nextInner(MemStore.java:81)
at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:112)
at
org.apache.nutch.storage.TestGoraStorage.readWrite(TestGoraStorage.java:74)
at
org.apache.nutch.storage.TestGoraStorage.access$100(TestGoraStorage.java:41)
at
org.apache.nutch.storage.TestGoraStorage$1.call(TestGoraStorage.java:107)
at
org.apache.nutch.storage.TestGoraStorage$1.call(TestGoraStorage.java:102)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
{code}
I believe that the final failure is due to to the use of TreeMap [5] as a
private object in MemStore. TreeMap implementations are not synchronized. If
multiple threads access a map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or more
mappings; merely changing the value associated with an existing key is not a
structural modification.) This is typically accomplished by synchronizing on
some object that naturally encapsulates the map. If no such object exists, the
map should be "wrapped" using the Collections.synchronizedSortedMap method.
This is best done at creation time, to prevent accidental unsynchronized access
to the map e.g.
SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));
N.B. The NOTE on TreeMap's come right from the Oracle JavaDoc.
[0]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestGenerator.java?view=markup
[1]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestInjector.java?view=markup
[2]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/fetcher/TestFetcher.java?view=markup
[3]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/storage/TestGoraStorage.java?view=markup
[4]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/util/AbstractNutchTest.java?view=markup
[5] http://docs.oracle.com/javase/6/docs/api/java/util/TreeMap.html
> Various Issues with MemStore
> -----------------------------
>
> Key: GORA-225
> URL: https://issues.apache.org/jira/browse/GORA-225
> Project: Apache Gora
> Issue Type: Bug
> Components: gora-core, testing
> Affects Versions: 0.3
> Environment: Nutch 2.x HEAD, gora-core 0.3
> Reporter: Lewis John McGibbney
> Fix For: 0.4
>
>
> In Nutch we have numerous testing scenarios which simulate persistence of
> data to Gora in some form or other. It has worked good as until now.
> Now that gora-sql-0.1.1-incubating artifact is non-compatible with gora-core
> 0.3, there is a requirement to address this situation in order to keep some
> degree of integrity within the Nutch codebase.
> Specifcally a number of tests [0][1][2][3] all extend a Util testing class
> which utilizes functionality from the gora-sql artifact.
> My initial solution was to switch to using MemStore... which brought me to
> logging this issue!
> I've logged sub issues here to make clear distinction about my observations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira