Hi Lewis, My answers are inline.
On Mar 6, 2013 4:03 PM, "Lewis John Mcgibbney" <[email protected]> wrote: > > Hi All, > I've patched my local copy of gora trunk with the Cassandra specific > (GORA-206) version of Alfonso's patch in GORA_174. GORA-206 is a complementary patch to GORA-174 which at the moment contains GORA-174 but it should be separated so we can handle them independently. > I've documented my work here [0] with an accompanying log file [1] which > should be consulted in parallel. Thanks for taking the time to do this mate (: > We are having problems here... pretty major problems. > I would like to discuss my findings in this thread and I am sure that > others will have questions once they've read [0]. I therefore think that > this is the best way to take the discussion forward. > Thank you very much. After reading [0], I think you are pointing out several problems and improvements to Gora-Cassandra which is awesome I think (: IMHO I think we should open different JIRA issues to track all these down. Gora ------- 1. Various iterations of keyspace and field mapping. This is a performance improvement we've already talked about. We really have to rewrite some code from the CassandraMapping part.2. Keyspace reseting from null to webpage. We should determine why keyspace becomes null, if it is Gora's problem or Nutch's Generator job. 3. The exception at GoraRecordWriter.class. This is going to be an interesting one to work on because our MapReduce support is rusty and needs to be reviewed in order to be improved. Did we have this problems before? Could you please a little bit better your comment "list p returns to rows from within cassandra-cli!"? Nutch -------- 1. GeneratorJob is working in a funny way. We should look into this over the Nutch's lists. 2. ParserJob has to be checked in detail. Just one last thing. Even though Nutch offers a great use case for Gora usage, let's try not to mix Nutch problems with Gora's. If there are problems with different Nutch jobs, then let's open JIRA over in NutchLand and attack them from there so we don't get things mixed up because we actually don't know what the root cause of this problems might be (Nutch's or Gora's). I will run some specific tests with WebPage.avsc but your logs and being able to persist baseUrl (which is a union data type) means that GORA-206 might be working to a certain degree, what do you think? what do others think? Thanks again Lewis for your awesome work! (: Renato M. > > Lewis > > [0] http://people.apache.org/~lewismc/gora/dev.txt > [1] http://people.apache.org/~lewismc/gora/hadoop.log > > -- > *Lewis*

