Hi Dishara,

I've taken the liberty of creating a code review at [1]. This is all
commits. I've emailed you separately with the comments. I think it would be
good if we can get into the habit of looking at the code in this way as it
often removes confusion introduced by the english language (which has many
compilers ;), mine has been known to be buggy at times.).


More comments inline below: (BTW, excellent progress!)

Best Regards
Ian


1 https://codereview.appspot.com/10811044/



On 30 June 2013 22:52, Dishara Wijewardana <ddwijeward...@gmail.com> wrote:

> On Fri, Jun 28, 2013 at 4:37 AM, Ian Boston <i...@tfd.co.uk> wrote:
>
> > Hi,
> > Have you tried the TypeInferringSerializer for the value serializer ?
> > That claims to be detect what the column value is based on the Byte
> array.
> >
> > Failing that, I would consider making everything byte[] and using your
> own
> > serializer that writes and read values to a byte[] using DataInputStream
> > DataOutputStream.
> >
> > [2] Is an example of a serializer written for that purpose that was used
> > with Cassandra over raw Thrift. Its not easy to read what it outputs to
> the
> > storage layer, but it is compact and efficient. I would not use it
> directly
> > as it does some very specific things like slicing large byte[]s into 1MB
> > chunks and bypassing the 64K limit on reading and writing UTF8 strings
> with
> > DataInputStream.
> >
> > Try the TypeInferringSerializer first. If it works great, no need to do
> > anything more complex.
> >
>
> Hi,
> In fact I was able to add as many params as I wanted with the same
> configurations. But TypeInferringSerializer is a useful one too which might
> need in future.
> Also I was thinking rather than storing resource meta data as String
> values, how about storing a serialized object as you mentioned ?


I suspect that TypeInferringSerializer will do a better job of serializing
than the approach I mentioned. Only consider writing your own, if there is
a real and demonstrated need for it.


> It will be
> clear. But I am not sure about the performance. Because when we have multi
> valued columns like meta data we have to insert them in a single String as
> comma separated values. It is scalable if we have a Bean for Cassandra
> Resource ? What do you think ?
>

Put one property per column in Cassandra if possible. IIRC it does a good
job of serializing data, and doesnt need a pre-defined schema as
traditional RDBMS's do. The serialisation I mentioned was mostly used to
get schemaless storage into an RDBMS.



>
> And I did a first cut of this  but with many TODOs ;-),  where getResource
> method is implemented and currently all the content is printed, but I have
> not implemented methods in CassandraResource yet. This is just a POC to
> test whether the proposed model works. Apparently it works [1].


Yes, this is a great start! I didn't find to many issues with the approach,
as you will see from the comments on the code review.




>  See
>  CassandraDataPopulator class which is a plain java test class added for
> the moment to test the POC.(I am moving this to a proper JUnit)
>

Good.


>
> TODOs
> - I am in the process of  finishing the implementation of Cassandra
> Resource, CassandaResource Provider and etc END to END.
> - Move to JUnit test framework and  write more tests for each scenario
> where I can extend this to Mockito (I am still not clear how Mockito comes
> in to the picture) in near future.
>

When you write the Unit tests, if you find that you need to mock anything
(ie ResourceResolver) to make your unit tests work, dont. Use Mocks. You
can even Mockup concrete clases so could mockup the behaviour of the Hector
API to respond in a pre-defined way to certain CQL queries. This will
eliminate the need to have a real cassandra server present when doing the
basic unit tests.




> - Change the implementation based on the feedbacks from the community.
> - Parameterize the constants as much as possible to read from a property
> file.
>

These should come from OSGi Properties. See the comments on
CassandraResoureProvider






>
>
> [1] -
> https://cassandra-backend-for-sling.googlecode.com/svn/trunk/main/cassandra
>
> Thanks
>

Excellent progress, thank you!
Ian


>
> >
> >
> > Ian
> >
> > 1
> >
> >
> http://hector-client.github.io/hector/source/content/API/core/0.8.0-2/me/prettyprint/cassandra/serializers/TypeInferringSerializer.html
> >
> > 2
> >
> >
> https://github.com/ieb/sparsemapcontent/tree/master/core/src/main/java/org/sakaiproject/nakamura/lite/storage/spi/types
> >
> >
> > On 28 June 2013 05:14, Dishara Wijewardana <ddwijeward...@gmail.com>
> > wrote:
> >
> > > Hi Ian,
> > > I am having a problem with CQL..
> > >
> > > For example:
> > >         CqlQuery*<String,String,Long>* cqlQuery = new CqlQuery*
> > > <String,String,Long>*(keyspace, new StringSerializer(),new
> > > StringSerializer(), new LongSerializer();
> > >         cqlQuery.setQuery("insert into mytable
> > (KEY,password,gender,userid)
> > > values (3,'pass1','male',34);");
> > >         QueryResult<CqlRows<String,String,Long>> result =
> > > cqlQuery.execute();
> > >
> > > This will successfully insert the row with pass1,male and 34 values
> under
> > > rowId=3.
> > >
> > > But in sling scenario, we need to have more serializers for a query as
> > > follows. Since we have more columns.
> > > i.e
> > >         CqlQuery*<String,String,String,String> *cqlQuery = new
> CqlQuery*
> > > <String,String,String,String>*(keyspace, new StringSerializer(),new
> > > StringSerializer(),new       StringSerializer(),new
> StringSerializer());
> > >         cqlQuery.setQuery("insert into mytable
> > > (KEY,path,resourceType,resourceSuperType,metadata) values
> > > (3,'/content/cassandra/foo/bar','nt:cassandra','nt:super','metadata');
> > >         QueryResult<CqlRows<String,String,Long>> result =
> > > cqlQuery.execute();
> > >
> > > Here I am using me.prettyprint.cassandra.model.CqlQuery class. Any idea
> > how
> > > to proceed with this.
> > >
> > > Am I doing something wring or is this a limitation of the API I am
> using
> > ?
> > >
> > >
> > > On Thu, Jun 27, 2013 at 7:41 AM, Dishara Wijewardana <
> > > ddwijeward...@gmail.com> wrote:
> > >
> > > >
> > > >
> > > > On Thu, Jun 27, 2013 at 4:26 AM, Ian Boston <i...@tfd.co.uk> wrote:
> > > >
> > > >> On 27 June 2013 02:34, Dishara Wijewardana <ddwijeward...@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >> > On Tue, Jun 25, 2013 at 4:52 AM, Ian Boston <i...@tfd.co.uk>
> wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > (I might have errors in the CQL, Cassandra schema and the
> > functions
> > > >> need
> > > >> > > proper escaping)
> > > >> > >
> > > >> > >
> > > >> > > Example 1:
> > > >> > > Zero depth tree wiht UUID as the rowid or key.
> > > >> > >
> > > >> > > URL /content/cassandra/pictures/13f58d5c95c70b6f
> > > >> > >
> > > >> > > then the column family is pictures and the URL -> ROWID function
> > > just
> > > >> > > results in the ROWID being 13f58d5c95c70b6f and
> > > >> > >
> > > >> > > String cql =
> > > mapOfCassandraMappers.get("pictures").getCQL("pictures",
> > > >> "
> > > >> > > 13f58d5c95c70b6f")
> > > >> > > System.err.println(cql);
> > > >> > >
> > > >> > > where
> > > >> > > String getCQL(String cf, String path) {
> > > >> > >     return "select * from "+cf+" where rowid = '"+path+"'";
> > > >> > > }
> > > >> > >
> > > >> > > yields:
> > > >> > > select * from pictures where rowid = '13f58d5c95c70b6f'
> > > >> > >
> > > >> > >
> > > >> > > 13f58d5c95c70b6f would be generated by the application when the
> > user
> > > >> > > created a new picture (by upload).
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > Example 2:
> > > >> > > User specified
> > > >> > >
> > > >> > > URL
> > > >> /content/cassandra/catalogue/capacitors/electrolytic/axial/16v/10uf
> > > >> > >
> > > >> > > String cql =
> > > >> mapOfCassandraMappers.get("catalogue").getCQL("catalogue", "
> > > >> > > capacitors/electrolytic/axial/16v/10uf")
> > > >> > > System.err.println(cql);
> > > >> > >
> > > >> > > where
> > > >> > > String getCQL(String cf, String path) {
> > > >> > >     MessageDigest md = MessageDigest.getInstance("SHA1");
> > > >> > >     String rowID =
> > Base64.encode(md.finish(path.getBytes("UTF-8")));
> > > >> > >     return "select * from "+cf+" where rowid = '"+rowID+"'";
> > > >> > > }
> > > >> > >
> > > >> > > yields
> > > >> > >
> > > >> > > select * from pictures where rowid = 'NzdlZmU4OTZmNGM4MzMwYzZ'
> > > >> > >
> > > >> > > If you want to find the parent then
> > > >> > >
> > > >> > > mapOfCassandraMappers.get("catalogue").getCQL("catalogue", "
> > > >> > > capacitors/electrolytic/axial/16v")
> > > >> > >
> > > >> > > select * from pictures where rowid = 'ZGFzZGZzZnNkYWZzYWRmc2R'
> > > >> > >
> > > >> > > And if the parent is stored in the property parent then
> > > >> > >
> > > >> > > select * from pictures where parent = 'ZGFzZGZzZnNkYWZzYWRmc2R'
> > > >> > >
> > > >> > > will generate a list of children. (Not sure about performance)
> > > >> > >
> > > >> > >
> > > >> > > Example 3:
> > > >> > > User is allowed to enter the RowID directly (identical to
> Example
> > 1
> > > >> > > URL
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> /content/cassandra/cannesfilmfestival/TomCruiseCassino-20130402112345-ieb.jpg
> > > >> > >
> > > >> > > where
> > > >> > > String getCQL(String cf, String path) {
> > > >> > >     return "select * from "+cf+" where rowid = '"+path+"'";
> > > >> > > }
> > > >> > >
> > > >> > > yields:
> > > >> > > select * from pictures where rowid = '
> > > >> > > TomCruiseCassino-20130402112345-ieb.jpg'
> > > >> > >
> > > >> >
> > > >> > This should be corrected as
> > > >> > select * from cannesfilmfestival where rowid = '
> > > >> > TomCruiseCassino-20130402112345-ieb.jpg'
> > > >> >
> > > >> >
> > > >> > >
> > > >> > >
> > > >> > > Does that make sense ?
> > > >> > >
> > > >> >
> > > >>
> > > >> Hi
> > > >>
> > > >>
> > > >> > Hi Ian,
> > > >> > I was in fact practicing some cql stuff in related to this
> response
> > > >> (with
> > > >> > cassandra cql terminal). This is quite a wonderful explanation
> for a
> > > new
> > > >> > comer like me. Thank you very much for the explanation again. Now
> it
> > > >> really
> > > >> > makes sense.
> > > >> >
> > > >>
> > > >> excellent!
> > > >>
> > > >>
> > > >> >
> > > >> > Other than the zero depth approach, I believe users will be more
> > > >> > comfortable with Example 2 approach.
> > > >> > Shall we go ahead with it ?
> > > >> >
> > > >>
> > > >>
> > > >> Yes, go for it. It will be interesting to see how hard it is to
> > > implement
> > > >> and how well (or not) it works. Remember, keep it as simple as
> > possible
> > > >> and
> > > >> dont try and and cover every use case at the expense of getting a
> PoC
> > > >> working.
> > > >>
> > > > +1.
> > > >
> > > >>
> > > >> However, dont forget, Unit tests mocked with Mockito are a quicker
> way
> > > of
> > > >> getting to working code, than no unit test coverage.
> > > >>
> > > >> Best Regards
> > > >> Ian
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> >
> > > >> >
> > > >> > > Ian
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On 25 June 2013 05:29, Dishara Wijewardana <
> > ddwijeward...@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > On Mon, Jun 24, 2013 at 4:02 AM, Ian Boston <i...@tfd.co.uk>
> > > wrote:
> > > >> > > >
> > > >> > > > > Hi Dishara,
> > > >> > > > > Yes. 1 resource == 1 row.
> > > >> > > > > The columns within that row represent the properties of the
> > > >> resource.
> > > >> > > > > I suggest that you use standard property names where
> > appropriate
> > > >> (eg
> > > >> > > > > sling:resourceType is the Resource.resourceType etc)
> > > >> > > > >
> > > >> > > > > The Resource itself should be adaptable to a generic
> > > >> > CassandraResource
> > > >> > > > > (which will probably implement Resource) which will have a
> map
> > > of
> > > >> > > > > properties containing all the columns of the cassandra row.
> > > >> (optimise
> > > >> > > > > later) A CassandraResource might look and feel like a
> > > Map<String,
> > > >> > > Object>
> > > >> > > > > or it might have a Map<String, Object> getProperties()
> method,
> > > or
> > > >> > > better
> > > >> > > > > still be adaptable to a Map. The essential think is dont
> hard
> > > code
> > > >> > the
> > > >> > > > > property names in the interface of CassandraResource for the
> > > >> moment.
> > > >> > ie
> > > >> > > > no
> > > >> > > > > getContentType() and no getMimeType(), as we dont really
> know
> > > >> what a
> > > >> > > > > CassandraResource will store.
> > > >> > > > >
> > > >> > > > > ResourceMetadata should be built from a subset of the
> > > >> > CassandraResource
> > > >> > > > > properties.
> > > >> > > > >
> > > >> > > > > You won't need to implement a ResourceResolver, only a
> > > >> > ResourceProvider
> > > >> > > > > (and Factory). I would use CQL in preference to other API
> > > methods.
> > > >> > > > >
> > > >> > > > > There is one thing that hasnt been mentioned, and thats the
> > URL
> > > ->
> > > >> > > > > Cassandra Row mapping.
> > > >> > > > > There are several ways of doing this.
> > > >> > > > >
> > > >> > > > > eg:
> > > >> > > > > URL = /content/cassandra/<columnFamily>/<rowID>
> > > >> > > > >  Cassandra Column Family = columnFamily
> > > >> > > > >  Cassandra RowID = rowID
> > > >> > > > > or
> > > >> > > > > URL =
> > > >> /content/cassandra/<columnFamilySelector>/remainder/of/the/path
> > > >> > > > >  Cassandra  Cassandra Column Family =
> > > >> > > > > mapOfColumnFamilies.get(columnFamilySelector)
> > > >> > > > >  Cassandra  RowID = function(/remainder/of/the/path)
> > > >> > > > >
> > > >> > > > > or to take that one stage further
> > > >> > > > >
> > > >> > > > > public interface CassandraMapper {
> > > >> > > > >       String getCQL(String columnFamilySelector, String
> path);
> > > >> > > > > }
> > > >> > > > >
> > > >> > > > Hi Ian
> > > >> > > > Thank you for the detailed explanation.
> > > >> > > >
> > > >> > > > OK. +1 for this approach with the mentioned flexibility.But  I
> > > need
> > > >> a
> > > >> > > small
> > > >> > > > clarification. With this approach,
> > > >> > > >
> > > >> > > > URL = /content/cassandra/<columnFamilySelector>ROW-ID
> > > >> > > > ROW-ID - function(/remainder/of/the/path).
> > > >> > > > So you mean ROW-ID is something we have to programatically
> > > uniquely
> > > >> > > create
> > > >> > > >  right ? like a UUID.
> > > >> > > >
> > > >> > > > What is this "/remainder/of/the/path" means ? Can you give an
> > > >> example
> > > >> > > with
> > > >> > > > real values in the context of a user who want to obtain a
> > resource
> > > >> from
> > > >> > > > cassandra.
> > > >> > > > This is just for my understanding.
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > >
> > > >> > > > > URL =
> > > /content/cassandra/<columnFamilySelector>/<remainderOfPath>
> > > >> > > > >
> > > >> > > > >  String cqlQuery =
> > > >> > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> mapOfCassandraMappers.get(columnFamilySelector).getCQL(columnFamilySelector,
> > > >> > > > > remainderOfPath);
> > > >> > > > >
> > > >> > > > > Which would allow us provided one or more implementations of
> > > >> > > > > CassandraMapper to map between URL and CQL.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > HTH
> > > >> > > > > Ian
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On 23 June 2013 19:29, Dishara Wijewardana <
> > > >> ddwijeward...@gmail.com>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi Ian,
> > > >> > > > > >
> > > >> > > > > > What is the data mapping should be between Cassandra and
> > Sling
> > > >> > > > resource.
> > > >> > > > > I
> > > >> > > > > > mean is a Sling Resource maps to a Cassandra Column ? Or
> > > Column
> > > >> > > Family
> > > >> > > > ?
> > > >> > > > > >
> > > >> > > > > > Because to get this Cassandra and Sling story correct we
> > need
> > > to
> > > >> > > > finalize
> > > >> > > > > > this.
> > > >> > > > > > For an example what we eventually returns is a Sling
> > resource.
> > > >> > > > Everything
> > > >> > > > > > that needs to fill in to create Sling resource should be
> > > stored
> > > >> in
> > > >> > > > > > Cassandra.
> > > >> > > > > > In a Sling resource,
> > > >> > > > > >
> > > >> > > > > >    - Path - direct sling resource path
> > > >> > > > > >    - ResourceType - nt:cassandra
> > > >> > > > > >    - ResourceSuperType - ?
> > > >> > > > > >    - ResourceMetadata - we can create this on the fly with
> > the
> > > >> data
> > > >> > > > from
> > > >> > > > > >    the corresponding column. At insertion, those need to
> be
> > > >> stored.
> > > >> > > > > > Following
> > > >> > > > > >    are the ones which I thought might be useful by default
> > to
> > > be
> > > >> > set
> > > >> > > > for
> > > >> > > > > > any
> > > >> > > > > >    node. Please add if we need anything more.
> > > >> > > > > >       - ContentType
> > > >> > > > > >       - ContentLength
> > > >> > > > > >       - CreationTime
> > > >> > > > > >       - ModificationTime
> > > >> > > > > >    - ResourceResolver -  Do we need a resolver in this
> case
> > ?
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >  So I believe in CQL context, one ROW should represent a
> > Sling
> > > >> > > > resource.
> > > >> > > > > If
> > > >> > > > > > that is the case for ResourceMetadata we might need a
> > separate
> > > >> > column
> > > >> > > > to
> > > >> > > > > > store it since it has multiple values. I am not sure
> whether
> > > we
> > > >> can
> > > >> > > do
> > > >> > > > it
> > > >> > > > > > with CQL, but it should be possible with hector APIs may
> be.
> > > >> > > > > >
> > > >> > > > > > Appreciate your thoughts ?
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Wed, Jun 19, 2013 at 1:19 AM, Dishara Wijewardana <
> > > >> > > > > > ddwijeward...@gmail.com> wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Ian,
> > > >> > > > > > > I am starting this thread to keep track on things
> related
> > to
> > > >> the
> > > >> > > GSoC
> > > >> > > > > > > project related milestone status updates and related
> > > >> discussions.
> > > >> > > > > > > So the first task over view will be as follows as per
> GSoC
> > > >> > proposal
> > > >> > > > > > > provided.
> > > >> > > > > > >
> > > >> > > > > > > 1. Implementing a CassandraResourceProvider  to READ
> from
> > > >> > > Cassandra.
> > > >> > > > > > > Implementation Details [1]
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > [1] : Implementation Details:
> > > >> > > > > > >
> > > >> > > > > > >  1.A) Write a CassanrdaResourceProviderUtil  which is
> > > >> basically a
> > > >> > > > > > > cassendra client which will facilitate all cassandra
> > related
> > > >> > > > operations
> > > >> > > > > > > required by other modules (CassandraResourceProvider and
> > > >> > > > > > > CassandraResourceResolver).
> > > >> > > > > > >
> > > >> > > > > > > 1.B) Implementation of  CassandraResourceProvider
> > > >> > > > > > >
> > > >> > > > > > > 1.C)  Implementation of CassandraResourceResolver
> > > >> > > > > > >
> > > >> > > > > > > 1.D) Implementation of CassandraResource
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > And I will start writing the
> CassanrdaResourceProviderUtil
> > > >> class
> > > >> > > > which
> > > >> > > > > > > will do basic add and get using hector API. Please
> provide
> > > any
> > > >> > > > feedback
> > > >> > > > > > > that will be useful to accomplish this task.
> > > >> > > > > > > So for this how does path mapping should be done.
> Because
> > > for
> > > >> > > > example,
> > > >> > > > > > the
> > > >> > > > > > > path of the cassendra node will not be same as the jcr
> > node
> > > >> path.
> > > >> > > i.e
> > > >> > > > > > > provider will ask a node path /system/myapps/test/foo
> and
> > > >> where
> > > >> > > > should
> > > >> > > > > we
> > > >> > > > > > > return it from Cassandra. Aren't we have to first
> consider
> > > the
> > > >> > > WRITE
> > > >> > > > > > aspect
> > > >> > > > > > > to Cassandra ?
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > --
> > > >> > > > > > > Thanks
> > > >> > > > > > > /Dishara
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > --
> > > >> > > > > > Thanks
> > > >> > > > > > /Dishara
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > Thanks
> > > >> > > > /Dishara
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Thanks
> > > >> > /Dishara
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks
> > > > /Dishara
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks
> > > /Dishara
> > >
> >
>
>
>
> --
> Thanks
> /Dishara
>

Reply via email to