Hi Alfonso,
2013/3/8 Alfonso Nishikawa <[email protected]>: > Hi Renato, > >> " . . . Gora should be low invasive: data schema is created and stored >> out of the backend so ideally you could access your data without Gora. >> We will see that this is hard to achieve at some extent (like in >> nested records or several types unions)." >> >> You can access your data directly without using Gora e.g. your Nutch >> data can be queried and retrieved by using HBase clients, Cassandra >> clients, Hadoop, or anything. Is this what you meant? > > Not exactly. > We may have to make data accesible without Gora. Generally speaking, > in 0.2.1 this is possible except for some serialized things (usually > records). After union types, this becomes worse, but we can make it > accesible creating a new configuration option. In this case the access > would be schema-less. Ok I see what you mean, and yeah there are cases in which serializers just complicate things. But there are others where they behave alright. I used Gora trunk as a JDBC driver for a project and I could see my data without any problems (I was using Cassandra), but I do understand your point now, and agree with you as well. > The other point of view is accessing full data from outside Gora but > having the schema definition. Surely there are cases where this is > possible. As I told, this is hard to achieve at some extent, and not > all backends would support this. > >> About Nested records, oh man, your descriptions are really >> interesting! (: and you are right about the possible approaches, would >> you mind opening a JIRA issue to keep track of this? I mean for >> complex nested data structures. Thanks! > > Sorry, I don't understand. Should I open an issue? What I wrote were > only descriptions and thoughts :P My bad, yeah I think you should open different issues so we can have it on JIRA and get it done in the future. It's like the Pig adapter, it's been there for a while, but being on JIRA helps us all be aware of what needs to be improved or done. >> So when you are talking about implementing this on HBase, are you >> still talking about handling null-one-type-unions (GORA-174)? or are >> you talking about the nested features described before? > > Both. optional-singletype unions and multitypes unions. > Ok, so I am the one not understanding here man, sorry ): When you talk about multi type unions, are you referring to nested records? or are you referring them as separate things? >> About Cassandra issues, the cloning process you are describing is >> problem that Roland was looking into, let's hope we can work that one >> out soon. The way Gora-Cassandra serializes data is what you've >> described in your first option, and I also think the second one is a >> better option. > > I am thinking about something different than you. > > The way that Gora-cassanda serializes is what is shown in > "Implementation details in Cassandra" excluding "Proposed > implementations are two:" > The way described in "Proposed implementatios > First option" is not > gora-cassandra, but the approach of HBase. > The way described in "Proposed implementatios > Second option", as you > say, seems much better. Man, I might be looking at the wrong place but in [1], but we always use a ByteBufferSerializer to store our data, we do use specific serializers to obtain ByteBuffers from the value in CassandraClient[2], but at the end we store HColumns with byteBuffers. And to retrieve the data, we use the json schema to know which serializer to use and be able to get the data as it originally was. >> Did you happen to see that email Lewis sent about >> plugglable client architecture for Gora-Cassandra? the idea with this >> would be to create these type of abstractions. Good to see we are all >> in the same page (: > > I read something on the fly, but not analyzed it, so really I don't > have oppinion. At this moment I found gora quite pluggable: you put a > .jar with your backend in class library and configure gora.properties > to use your classes. I logically guess you are talking about further > topics :) Will read the abstract ;) Thanks mate (: >> I will open different JIRA issues to track all these problems >> separately in order to make smaller more digestible patches and start >> committing them and getting 0.3 out! > > I find right how is now with a main issue for common code, and one per > backend, but feel free. > >> Just one last question about GORA-174, do you remember that some tests >> were not passing after applying the patch? Well after applying >> GORA-174 + GORA-182 + GORA-206, there were not any under >> gora-xxx/target/surefire-reports/, is this what is expected? Of course >> after GORA-206 we've noticed that there are many other problems, but I >> think we should start moving along. > > It is expected no errors under target/surefire-reports. Good if > gora-cassdra has no one, but must be errors in other backend, and some > errors related to core. I am preparing my patches, I solved some more > bugs, and I found more bugs I have to solve. My bad, I overlooked some files, there are still many many errors )= I will also be working to solve them (: >> Thanks again, and keep up with the great work! > > Thank you with your feedback! Thanks again man! Renato M. > > Regards, > > Alfonso Nishikawa [1] https://github.com/renato2099/gora/blob/trunk/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/HectorUtils.java [2] https://github.com/renato2099/gora/blob/trunk/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraClient.java

