Hi Lewis,

My answers are inline.


On Mar 6, 2013 4:03 PM, "Lewis John Mcgibbney" <[email protected]>
wrote:
>
> Hi All,
> I've patched my local copy of gora trunk with the Cassandra specific
> (GORA-206) version of Alfonso's patch in GORA_174.

GORA-206 is a complementary patch to GORA-174 which at the moment contains
GORA-174 but it should be separated so we can handle them independently.

> I've documented my work here [0] with an accompanying log file [1] which
> should be consulted in parallel.

Thanks for taking the time to do this mate (:

> We are having problems here... pretty major problems.
> I would like to discuss my findings in this thread and I am sure that
> others will have questions once they've read [0]. I therefore think that
> this is the best way to take the discussion forward.
> Thank you very much.

After reading [0], I think you are pointing out several problems and
improvements to Gora-Cassandra which is awesome I think (: IMHO I think we
should open different JIRA issues to track all these down.

Gora
-------
1. Various iterations of keyspace and field mapping. This is a performance
improvement we've already talked about. We really have to rewrite some code
from the CassandraMapping part.2. Keyspace reseting from null to webpage.
We should determine why keyspace becomes null, if it is Gora's problem or
Nutch's Generator job.
3. The exception at GoraRecordWriter.class. This is going to be an
interesting one to work on because our MapReduce support is rusty and needs
to be reviewed in order to be improved. Did we have this problems before?

Could you please a little bit better your comment "list p returns to rows
from within cassandra-cli!"?

Nutch
--------
1. GeneratorJob is working in a funny way. We should look into this over
the Nutch's lists.
2. ParserJob has to be checked in detail.

Just one last thing. Even though Nutch offers a great use case for Gora
usage, let's try not to mix Nutch problems with Gora's. If there are
problems with different Nutch jobs, then let's open JIRA over in NutchLand
and attack them from there so we don't get things mixed up because we
actually don't know what the root cause of this problems might be (Nutch's
or Gora's).
I will run some specific tests with WebPage.avsc but your logs and being
able to persist baseUrl (which is a union data type) means that GORA-206
might be working to a certain degree, what do you think? what do others
think?
Thanks again Lewis for your awesome work! (:


Renato M.

>
> Lewis
>
> [0] http://people.apache.org/~lewismc/gora/dev.txt
> [1] http://people.apache.org/~lewismc/gora/hadoop.log
>
> --
> *Lewis*

Reply via email to