On Fri, Jun 28, 2013 at 4:37 AM, Ian Boston <[email protected]> wrote:

> Hi,
> Have you tried the TypeInferringSerializer for the value serializer ?
> That claims to be detect what the column value is based on the Byte array.
>
> Failing that, I would consider making everything byte[] and using your own
> serializer that writes and read values to a byte[] using DataInputStream
> DataOutputStream.
>
> [2] Is an example of a serializer written for that purpose that was used
> with Cassandra over raw Thrift. Its not easy to read what it outputs to the
> storage layer, but it is compact and efficient. I would not use it directly
> as it does some very specific things like slicing large byte[]s into 1MB
> chunks and bypassing the 64K limit on reading and writing UTF8 strings with
> DataInputStream.
>
> Try the TypeInferringSerializer first. If it works great, no need to do
> anything more complex.
>

Hi,
In fact I was able to add as many params as I wanted with the same
configurations. But TypeInferringSerializer is a useful one too which might
need in future.
Also I was thinking rather than storing resource meta data as String
values, how about storing a serialized object as you mentioned ? It will be
clear. But I am not sure about the performance. Because when we have multi
valued columns like meta data we have to insert them in a single String as
comma separated values. It is scalable if we have a Bean for Cassandra
Resource ? What do you think ?

And I did a first cut of this  but with many TODOs ;-),  where getResource
method is implemented and currently all the content is printed, but I have
not implemented methods in CassandraResource yet. This is just a POC to
test whether the proposed model works. Apparently it works [1].  See
 CassandraDataPopulator class which is a plain java test class added for
the moment to test the POC.(I am moving this to a proper JUnit)

TODOs
- I am in the process of  finishing the implementation of Cassandra
Resource, CassandaResource Provider and etc END to END.
- Move to JUnit test framework and  write more tests for each scenario
where I can extend this to Mockito (I am still not clear how Mockito comes
in to the picture) in near future.
- Change the implementation based on the feedbacks from the community.
- Parameterize the constants as much as possible to read from a property
file.


[1] -
https://cassandra-backend-for-sling.googlecode.com/svn/trunk/main/cassandra

Thanks

>
>
> Ian
>
> 1
>
> http://hector-client.github.io/hector/source/content/API/core/0.8.0-2/me/prettyprint/cassandra/serializers/TypeInferringSerializer.html
>
> 2
>
> https://github.com/ieb/sparsemapcontent/tree/master/core/src/main/java/org/sakaiproject/nakamura/lite/storage/spi/types
>
>
> On 28 June 2013 05:14, Dishara Wijewardana <[email protected]>
> wrote:
>
> > Hi Ian,
> > I am having a problem with CQL..
> >
> > For example:
> >         CqlQuery*<String,String,Long>* cqlQuery = new CqlQuery*
> > <String,String,Long>*(keyspace, new StringSerializer(),new
> > StringSerializer(), new LongSerializer();
> >         cqlQuery.setQuery("insert into mytable
> (KEY,password,gender,userid)
> > values (3,'pass1','male',34);");
> >         QueryResult<CqlRows<String,String,Long>> result =
> > cqlQuery.execute();
> >
> > This will successfully insert the row with pass1,male and 34 values under
> > rowId=3.
> >
> > But in sling scenario, we need to have more serializers for a query as
> > follows. Since we have more columns.
> > i.e
> >         CqlQuery*<String,String,String,String> *cqlQuery = new CqlQuery*
> > <String,String,String,String>*(keyspace, new StringSerializer(),new
> > StringSerializer(),new       StringSerializer(),new StringSerializer());
> >         cqlQuery.setQuery("insert into mytable
> > (KEY,path,resourceType,resourceSuperType,metadata) values
> > (3,'/content/cassandra/foo/bar','nt:cassandra','nt:super','metadata');
> >         QueryResult<CqlRows<String,String,Long>> result =
> > cqlQuery.execute();
> >
> > Here I am using me.prettyprint.cassandra.model.CqlQuery class. Any idea
> how
> > to proceed with this.
> >
> > Am I doing something wring or is this a limitation of the API I am using
> ?
> >
> >
> > On Thu, Jun 27, 2013 at 7:41 AM, Dishara Wijewardana <
> > [email protected]> wrote:
> >
> > >
> > >
> > > On Thu, Jun 27, 2013 at 4:26 AM, Ian Boston <[email protected]> wrote:
> > >
> > >> On 27 June 2013 02:34, Dishara Wijewardana <[email protected]>
> > >> wrote:
> > >>
> > >> > On Tue, Jun 25, 2013 at 4:52 AM, Ian Boston <[email protected]> wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > (I might have errors in the CQL, Cassandra schema and the
> functions
> > >> need
> > >> > > proper escaping)
> > >> > >
> > >> > >
> > >> > > Example 1:
> > >> > > Zero depth tree wiht UUID as the rowid or key.
> > >> > >
> > >> > > URL /content/cassandra/pictures/13f58d5c95c70b6f
> > >> > >
> > >> > > then the column family is pictures and the URL -> ROWID function
> > just
> > >> > > results in the ROWID being 13f58d5c95c70b6f and
> > >> > >
> > >> > > String cql =
> > mapOfCassandraMappers.get("pictures").getCQL("pictures",
> > >> "
> > >> > > 13f58d5c95c70b6f")
> > >> > > System.err.println(cql);
> > >> > >
> > >> > > where
> > >> > > String getCQL(String cf, String path) {
> > >> > >     return "select * from "+cf+" where rowid = '"+path+"'";
> > >> > > }
> > >> > >
> > >> > > yields:
> > >> > > select * from pictures where rowid = '13f58d5c95c70b6f'
> > >> > >
> > >> > >
> > >> > > 13f58d5c95c70b6f would be generated by the application when the
> user
> > >> > > created a new picture (by upload).
> > >> > >
> > >> > >
> > >> > >
> > >> > > Example 2:
> > >> > > User specified
> > >> > >
> > >> > > URL
> > >> /content/cassandra/catalogue/capacitors/electrolytic/axial/16v/10uf
> > >> > >
> > >> > > String cql =
> > >> mapOfCassandraMappers.get("catalogue").getCQL("catalogue", "
> > >> > > capacitors/electrolytic/axial/16v/10uf")
> > >> > > System.err.println(cql);
> > >> > >
> > >> > > where
> > >> > > String getCQL(String cf, String path) {
> > >> > >     MessageDigest md = MessageDigest.getInstance("SHA1");
> > >> > >     String rowID =
> Base64.encode(md.finish(path.getBytes("UTF-8")));
> > >> > >     return "select * from "+cf+" where rowid = '"+rowID+"'";
> > >> > > }
> > >> > >
> > >> > > yields
> > >> > >
> > >> > > select * from pictures where rowid = 'NzdlZmU4OTZmNGM4MzMwYzZ'
> > >> > >
> > >> > > If you want to find the parent then
> > >> > >
> > >> > > mapOfCassandraMappers.get("catalogue").getCQL("catalogue", "
> > >> > > capacitors/electrolytic/axial/16v")
> > >> > >
> > >> > > select * from pictures where rowid = 'ZGFzZGZzZnNkYWZzYWRmc2R'
> > >> > >
> > >> > > And if the parent is stored in the property parent then
> > >> > >
> > >> > > select * from pictures where parent = 'ZGFzZGZzZnNkYWZzYWRmc2R'
> > >> > >
> > >> > > will generate a list of children. (Not sure about performance)
> > >> > >
> > >> > >
> > >> > > Example 3:
> > >> > > User is allowed to enter the RowID directly (identical to Example
> 1
> > >> > > URL
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> /content/cassandra/cannesfilmfestival/TomCruiseCassino-20130402112345-ieb.jpg
> > >> > >
> > >> > > where
> > >> > > String getCQL(String cf, String path) {
> > >> > >     return "select * from "+cf+" where rowid = '"+path+"'";
> > >> > > }
> > >> > >
> > >> > > yields:
> > >> > > select * from pictures where rowid = '
> > >> > > TomCruiseCassino-20130402112345-ieb.jpg'
> > >> > >
> > >> >
> > >> > This should be corrected as
> > >> > select * from cannesfilmfestival where rowid = '
> > >> > TomCruiseCassino-20130402112345-ieb.jpg'
> > >> >
> > >> >
> > >> > >
> > >> > >
> > >> > > Does that make sense ?
> > >> > >
> > >> >
> > >>
> > >> Hi
> > >>
> > >>
> > >> > Hi Ian,
> > >> > I was in fact practicing some cql stuff in related to this response
> > >> (with
> > >> > cassandra cql terminal). This is quite a wonderful explanation for a
> > new
> > >> > comer like me. Thank you very much for the explanation again. Now it
> > >> really
> > >> > makes sense.
> > >> >
> > >>
> > >> excellent!
> > >>
> > >>
> > >> >
> > >> > Other than the zero depth approach, I believe users will be more
> > >> > comfortable with Example 2 approach.
> > >> > Shall we go ahead with it ?
> > >> >
> > >>
> > >>
> > >> Yes, go for it. It will be interesting to see how hard it is to
> > implement
> > >> and how well (or not) it works. Remember, keep it as simple as
> possible
> > >> and
> > >> dont try and and cover every use case at the expense of getting a PoC
> > >> working.
> > >>
> > > +1.
> > >
> > >>
> > >> However, dont forget, Unit tests mocked with Mockito are a quicker way
> > of
> > >> getting to working code, than no unit test coverage.
> > >>
> > >> Best Regards
> > >> Ian
> > >>
> > >>
> > >>
> > >>
> > >> >
> > >> >
> > >> > > Ian
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On 25 June 2013 05:29, Dishara Wijewardana <
> [email protected]
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > On Mon, Jun 24, 2013 at 4:02 AM, Ian Boston <[email protected]>
> > wrote:
> > >> > > >
> > >> > > > > Hi Dishara,
> > >> > > > > Yes. 1 resource == 1 row.
> > >> > > > > The columns within that row represent the properties of the
> > >> resource.
> > >> > > > > I suggest that you use standard property names where
> appropriate
> > >> (eg
> > >> > > > > sling:resourceType is the Resource.resourceType etc)
> > >> > > > >
> > >> > > > > The Resource itself should be adaptable to a generic
> > >> > CassandraResource
> > >> > > > > (which will probably implement Resource) which will have a map
> > of
> > >> > > > > properties containing all the columns of the cassandra row.
> > >> (optimise
> > >> > > > > later) A CassandraResource might look and feel like a
> > Map<String,
> > >> > > Object>
> > >> > > > > or it might have a Map<String, Object> getProperties() method,
> > or
> > >> > > better
> > >> > > > > still be adaptable to a Map. The essential think is dont hard
> > code
> > >> > the
> > >> > > > > property names in the interface of CassandraResource for the
> > >> moment.
> > >> > ie
> > >> > > > no
> > >> > > > > getContentType() and no getMimeType(), as we dont really know
> > >> what a
> > >> > > > > CassandraResource will store.
> > >> > > > >
> > >> > > > > ResourceMetadata should be built from a subset of the
> > >> > CassandraResource
> > >> > > > > properties.
> > >> > > > >
> > >> > > > > You won't need to implement a ResourceResolver, only a
> > >> > ResourceProvider
> > >> > > > > (and Factory). I would use CQL in preference to other API
> > methods.
> > >> > > > >
> > >> > > > > There is one thing that hasnt been mentioned, and thats the
> URL
> > ->
> > >> > > > > Cassandra Row mapping.
> > >> > > > > There are several ways of doing this.
> > >> > > > >
> > >> > > > > eg:
> > >> > > > > URL = /content/cassandra/<columnFamily>/<rowID>
> > >> > > > >  Cassandra Column Family = columnFamily
> > >> > > > >  Cassandra RowID = rowID
> > >> > > > > or
> > >> > > > > URL =
> > >> /content/cassandra/<columnFamilySelector>/remainder/of/the/path
> > >> > > > >  Cassandra  Cassandra Column Family =
> > >> > > > > mapOfColumnFamilies.get(columnFamilySelector)
> > >> > > > >  Cassandra  RowID = function(/remainder/of/the/path)
> > >> > > > >
> > >> > > > > or to take that one stage further
> > >> > > > >
> > >> > > > > public interface CassandraMapper {
> > >> > > > >       String getCQL(String columnFamilySelector, String path);
> > >> > > > > }
> > >> > > > >
> > >> > > > Hi Ian
> > >> > > > Thank you for the detailed explanation.
> > >> > > >
> > >> > > > OK. +1 for this approach with the mentioned flexibility.But  I
> > need
> > >> a
> > >> > > small
> > >> > > > clarification. With this approach,
> > >> > > >
> > >> > > > URL = /content/cassandra/<columnFamilySelector>ROW-ID
> > >> > > > ROW-ID - function(/remainder/of/the/path).
> > >> > > > So you mean ROW-ID is something we have to programatically
> > uniquely
> > >> > > create
> > >> > > >  right ? like a UUID.
> > >> > > >
> > >> > > > What is this "/remainder/of/the/path" means ? Can you give an
> > >> example
> > >> > > with
> > >> > > > real values in the context of a user who want to obtain a
> resource
> > >> from
> > >> > > > cassandra.
> > >> > > > This is just for my understanding.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > >
> > >> > > > > URL =
> > /content/cassandra/<columnFamilySelector>/<remainderOfPath>
> > >> > > > >
> > >> > > > >  String cqlQuery =
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> mapOfCassandraMappers.get(columnFamilySelector).getCQL(columnFamilySelector,
> > >> > > > > remainderOfPath);
> > >> > > > >
> > >> > > > > Which would allow us provided one or more implementations of
> > >> > > > > CassandraMapper to map between URL and CQL.
> > >> > > > >
> > >> > > > >
> > >> > > > > HTH
> > >> > > > > Ian
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On 23 June 2013 19:29, Dishara Wijewardana <
> > >> [email protected]>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi Ian,
> > >> > > > > >
> > >> > > > > > What is the data mapping should be between Cassandra and
> Sling
> > >> > > > resource.
> > >> > > > > I
> > >> > > > > > mean is a Sling Resource maps to a Cassandra Column ? Or
> > Column
> > >> > > Family
> > >> > > > ?
> > >> > > > > >
> > >> > > > > > Because to get this Cassandra and Sling story correct we
> need
> > to
> > >> > > > finalize
> > >> > > > > > this.
> > >> > > > > > For an example what we eventually returns is a Sling
> resource.
> > >> > > > Everything
> > >> > > > > > that needs to fill in to create Sling resource should be
> > stored
> > >> in
> > >> > > > > > Cassandra.
> > >> > > > > > In a Sling resource,
> > >> > > > > >
> > >> > > > > >    - Path - direct sling resource path
> > >> > > > > >    - ResourceType - nt:cassandra
> > >> > > > > >    - ResourceSuperType - ?
> > >> > > > > >    - ResourceMetadata - we can create this on the fly with
> the
> > >> data
> > >> > > > from
> > >> > > > > >    the corresponding column. At insertion, those need to be
> > >> stored.
> > >> > > > > > Following
> > >> > > > > >    are the ones which I thought might be useful by default
> to
> > be
> > >> > set
> > >> > > > for
> > >> > > > > > any
> > >> > > > > >    node. Please add if we need anything more.
> > >> > > > > >       - ContentType
> > >> > > > > >       - ContentLength
> > >> > > > > >       - CreationTime
> > >> > > > > >       - ModificationTime
> > >> > > > > >    - ResourceResolver -  Do we need a resolver in this case
> ?
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >  So I believe in CQL context, one ROW should represent a
> Sling
> > >> > > > resource.
> > >> > > > > If
> > >> > > > > > that is the case for ResourceMetadata we might need a
> separate
> > >> > column
> > >> > > > to
> > >> > > > > > store it since it has multiple values. I am not sure whether
> > we
> > >> can
> > >> > > do
> > >> > > > it
> > >> > > > > > with CQL, but it should be possible with hector APIs may be.
> > >> > > > > >
> > >> > > > > > Appreciate your thoughts ?
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Wed, Jun 19, 2013 at 1:19 AM, Dishara Wijewardana <
> > >> > > > > > [email protected]> wrote:
> > >> > > > > >
> > >> > > > > > > Hi Ian,
> > >> > > > > > > I am starting this thread to keep track on things related
> to
> > >> the
> > >> > > GSoC
> > >> > > > > > > project related milestone status updates and related
> > >> discussions.
> > >> > > > > > > So the first task over view will be as follows as per GSoC
> > >> > proposal
> > >> > > > > > > provided.
> > >> > > > > > >
> > >> > > > > > > 1. Implementing a CassandraResourceProvider  to READ from
> > >> > > Cassandra.
> > >> > > > > > > Implementation Details [1]
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > [1] : Implementation Details:
> > >> > > > > > >
> > >> > > > > > >  1.A) Write a CassanrdaResourceProviderUtil  which is
> > >> basically a
> > >> > > > > > > cassendra client which will facilitate all cassandra
> related
> > >> > > > operations
> > >> > > > > > > required by other modules (CassandraResourceProvider and
> > >> > > > > > > CassandraResourceResolver).
> > >> > > > > > >
> > >> > > > > > > 1.B) Implementation of  CassandraResourceProvider
> > >> > > > > > >
> > >> > > > > > > 1.C)  Implementation of CassandraResourceResolver
> > >> > > > > > >
> > >> > > > > > > 1.D) Implementation of CassandraResource
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > And I will start writing the CassanrdaResourceProviderUtil
> > >> class
> > >> > > > which
> > >> > > > > > > will do basic add and get using hector API. Please provide
> > any
> > >> > > > feedback
> > >> > > > > > > that will be useful to accomplish this task.
> > >> > > > > > > So for this how does path mapping should be done. Because
> > for
> > >> > > > example,
> > >> > > > > > the
> > >> > > > > > > path of the cassendra node will not be same as the jcr
> node
> > >> path.
> > >> > > i.e
> > >> > > > > > > provider will ask a node path /system/myapps/test/foo and
> > >> where
> > >> > > > should
> > >> > > > > we
> > >> > > > > > > return it from Cassandra. Aren't we have to first consider
> > the
> > >> > > WRITE
> > >> > > > > > aspect
> > >> > > > > > > to Cassandra ?
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > > Thanks
> > >> > > > > > > /Dishara
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Thanks
> > >> > > > > > /Dishara
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Thanks
> > >> > > > /Dishara
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Thanks
> > >> > /Dishara
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Thanks
> > > /Dishara
> > >
> >
> >
> >
> > --
> > Thanks
> > /Dishara
> >
>



-- 
Thanks
/Dishara

Reply via email to