[
https://issues.apache.org/jira/browse/GORA-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591851#comment-13591851
]
Roland commented on GORA-211:
-----------------------------
Now, there is a patch against trunk, and one against 0.2.
The one against trunk is correct, from my point of view and usable.
The patch against 0.2 is for demonstrating my idea about using
mutator.addInsertion() instead of mutator.insert(). It's a lot faster (at least
for the use-case nutch) because it only sends out 1 mutation request per key
for multiple columns in our buffer.
BUT: it will break all other code using CassandraClient directly, because now
you need to call executeMutator() for flushing to cassandra.
So this is only some kind of proof of concept. It works here and writes are a
lot faster.
> thread safety: java.lang.NullPointerException
> ---------------------------------------------
>
> Key: GORA-211
> URL: https://issues.apache.org/jira/browse/GORA-211
> Project: Apache Gora
> Issue Type: Bug
> Components: storage-cassandra
> Affects Versions: 0.2
> Environment: nutch 2.1 / cassandra 1.2.1 / gora-cassandra 0.2 /
> gora-core 0.2.1
> running fetch with parse=true
> fetcher.threads.per.queue=2
> nutch on a 16 core AMD Opteron 2GHz
> Cassandra on 8 core Intel Xeon 3.3 GHz
> Reporter: Roland
> Priority: Critical
> Attachments: GORA-211-0.2.patch, GORA-211-trunk.patch
>
>
> This is the result of debugging one of my issues described in NUTCH-1534.
> example trace:
> java.lang.NullPointerException
> at
> me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
> at
> me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:71)
> at
> org.apache.gora.cassandra.store.CassandraClient.addColumn(CassandraClient.java:139)
> at
> org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:307)
> at
> org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:212)
> at
> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
> at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at
> org.apache.nutch.fetcher.FetcherReducer$FetcherThread.output(FetcherReducer.java:664)
> at
> org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:534)
> I'm suspecting CassandraStore.put() not taking enough precautions to copy all
> objects safely to it's buffer.
> {code}
> switch(type) {
> case RECORD:
> Persistent persistent = (Persistent) fieldValue;
> Persistent newRecord = persistent.newInstance(new
> StateManagerImpl());
> for (Field member: fieldSchema.getFields()) {
> newRecord.put(member.pos(), persistent.get(member.pos()));
> }
> fieldValue = newRecord;
> break;
> case MAP:
> StatefulHashMap<?, ?> map = (StatefulHashMap<?, ?>) fieldValue;
> StatefulHashMap<?, ?> newMap = new StatefulHashMap(map);
> fieldValue = newMap;
> break;
> }
> {code}
> case RECORD - do we not need to duplicate the object returned by
> "persistent.get(member.pos())":
> newRecord.put(member.pos(), persistent.get(member.pos()))
> case MAP - do we not need to duplicate all value-objects of the map?
> I had not time to write a patch or test this, so, please comment :)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira