On Fri, Jun 28, 2013 at 4:37 AM, Ian Boston <[email protected]> wrote: > Hi, > Have you tried the TypeInferringSerializer for the value serializer ? > That claims to be detect what the column value is based on the Byte array. > > Failing that, I would consider making everything byte[] and using your own > serializer that writes and read values to a byte[] using DataInputStream > DataOutputStream. > > [2] Is an example of a serializer written for that purpose that was used > with Cassandra over raw Thrift. Its not easy to read what it outputs to the > storage layer, but it is compact and efficient. I would not use it directly > as it does some very specific things like slicing large byte[]s into 1MB > chunks and bypassing the 64K limit on reading and writing UTF8 strings with > DataInputStream. > > Try the TypeInferringSerializer first. If it works great, no need to do > anything more complex. >
Hi, In fact I was able to add as many params as I wanted with the same configurations. But TypeInferringSerializer is a useful one too which might need in future. Also I was thinking rather than storing resource meta data as String values, how about storing a serialized object as you mentioned ? It will be clear. But I am not sure about the performance. Because when we have multi valued columns like meta data we have to insert them in a single String as comma separated values. It is scalable if we have a Bean for Cassandra Resource ? What do you think ? And I did a first cut of this but with many TODOs ;-), where getResource method is implemented and currently all the content is printed, but I have not implemented methods in CassandraResource yet. This is just a POC to test whether the proposed model works. Apparently it works [1]. See CassandraDataPopulator class which is a plain java test class added for the moment to test the POC.(I am moving this to a proper JUnit) TODOs - I am in the process of finishing the implementation of Cassandra Resource, CassandaResource Provider and etc END to END. - Move to JUnit test framework and write more tests for each scenario where I can extend this to Mockito (I am still not clear how Mockito comes in to the picture) in near future. - Change the implementation based on the feedbacks from the community. - Parameterize the constants as much as possible to read from a property file. [1] - https://cassandra-backend-for-sling.googlecode.com/svn/trunk/main/cassandra Thanks > > > Ian > > 1 > > http://hector-client.github.io/hector/source/content/API/core/0.8.0-2/me/prettyprint/cassandra/serializers/TypeInferringSerializer.html > > 2 > > https://github.com/ieb/sparsemapcontent/tree/master/core/src/main/java/org/sakaiproject/nakamura/lite/storage/spi/types > > > On 28 June 2013 05:14, Dishara Wijewardana <[email protected]> > wrote: > > > Hi Ian, > > I am having a problem with CQL.. > > > > For example: > > CqlQuery*<String,String,Long>* cqlQuery = new CqlQuery* > > <String,String,Long>*(keyspace, new StringSerializer(),new > > StringSerializer(), new LongSerializer(); > > cqlQuery.setQuery("insert into mytable > (KEY,password,gender,userid) > > values (3,'pass1','male',34);"); > > QueryResult<CqlRows<String,String,Long>> result = > > cqlQuery.execute(); > > > > This will successfully insert the row with pass1,male and 34 values under > > rowId=3. > > > > But in sling scenario, we need to have more serializers for a query as > > follows. Since we have more columns. > > i.e > > CqlQuery*<String,String,String,String> *cqlQuery = new CqlQuery* > > <String,String,String,String>*(keyspace, new StringSerializer(),new > > StringSerializer(),new StringSerializer(),new StringSerializer()); > > cqlQuery.setQuery("insert into mytable > > (KEY,path,resourceType,resourceSuperType,metadata) values > > (3,'/content/cassandra/foo/bar','nt:cassandra','nt:super','metadata'); > > QueryResult<CqlRows<String,String,Long>> result = > > cqlQuery.execute(); > > > > Here I am using me.prettyprint.cassandra.model.CqlQuery class. Any idea > how > > to proceed with this. > > > > Am I doing something wring or is this a limitation of the API I am using > ? > > > > > > On Thu, Jun 27, 2013 at 7:41 AM, Dishara Wijewardana < > > [email protected]> wrote: > > > > > > > > > > > On Thu, Jun 27, 2013 at 4:26 AM, Ian Boston <[email protected]> wrote: > > > > > >> On 27 June 2013 02:34, Dishara Wijewardana <[email protected]> > > >> wrote: > > >> > > >> > On Tue, Jun 25, 2013 at 4:52 AM, Ian Boston <[email protected]> wrote: > > >> > > > >> > > Hi, > > >> > > > > >> > > (I might have errors in the CQL, Cassandra schema and the > functions > > >> need > > >> > > proper escaping) > > >> > > > > >> > > > > >> > > Example 1: > > >> > > Zero depth tree wiht UUID as the rowid or key. > > >> > > > > >> > > URL /content/cassandra/pictures/13f58d5c95c70b6f > > >> > > > > >> > > then the column family is pictures and the URL -> ROWID function > > just > > >> > > results in the ROWID being 13f58d5c95c70b6f and > > >> > > > > >> > > String cql = > > mapOfCassandraMappers.get("pictures").getCQL("pictures", > > >> " > > >> > > 13f58d5c95c70b6f") > > >> > > System.err.println(cql); > > >> > > > > >> > > where > > >> > > String getCQL(String cf, String path) { > > >> > > return "select * from "+cf+" where rowid = '"+path+"'"; > > >> > > } > > >> > > > > >> > > yields: > > >> > > select * from pictures where rowid = '13f58d5c95c70b6f' > > >> > > > > >> > > > > >> > > 13f58d5c95c70b6f would be generated by the application when the > user > > >> > > created a new picture (by upload). > > >> > > > > >> > > > > >> > > > > >> > > Example 2: > > >> > > User specified > > >> > > > > >> > > URL > > >> /content/cassandra/catalogue/capacitors/electrolytic/axial/16v/10uf > > >> > > > > >> > > String cql = > > >> mapOfCassandraMappers.get("catalogue").getCQL("catalogue", " > > >> > > capacitors/electrolytic/axial/16v/10uf") > > >> > > System.err.println(cql); > > >> > > > > >> > > where > > >> > > String getCQL(String cf, String path) { > > >> > > MessageDigest md = MessageDigest.getInstance("SHA1"); > > >> > > String rowID = > Base64.encode(md.finish(path.getBytes("UTF-8"))); > > >> > > return "select * from "+cf+" where rowid = '"+rowID+"'"; > > >> > > } > > >> > > > > >> > > yields > > >> > > > > >> > > select * from pictures where rowid = 'NzdlZmU4OTZmNGM4MzMwYzZ' > > >> > > > > >> > > If you want to find the parent then > > >> > > > > >> > > mapOfCassandraMappers.get("catalogue").getCQL("catalogue", " > > >> > > capacitors/electrolytic/axial/16v") > > >> > > > > >> > > select * from pictures where rowid = 'ZGFzZGZzZnNkYWZzYWRmc2R' > > >> > > > > >> > > And if the parent is stored in the property parent then > > >> > > > > >> > > select * from pictures where parent = 'ZGFzZGZzZnNkYWZzYWRmc2R' > > >> > > > > >> > > will generate a list of children. (Not sure about performance) > > >> > > > > >> > > > > >> > > Example 3: > > >> > > User is allowed to enter the RowID directly (identical to Example > 1 > > >> > > URL > > >> > > > > >> > > > > >> > > > >> > > > /content/cassandra/cannesfilmfestival/TomCruiseCassino-20130402112345-ieb.jpg > > >> > > > > >> > > where > > >> > > String getCQL(String cf, String path) { > > >> > > return "select * from "+cf+" where rowid = '"+path+"'"; > > >> > > } > > >> > > > > >> > > yields: > > >> > > select * from pictures where rowid = ' > > >> > > TomCruiseCassino-20130402112345-ieb.jpg' > > >> > > > > >> > > > >> > This should be corrected as > > >> > select * from cannesfilmfestival where rowid = ' > > >> > TomCruiseCassino-20130402112345-ieb.jpg' > > >> > > > >> > > > >> > > > > >> > > > > >> > > Does that make sense ? > > >> > > > > >> > > > >> > > >> Hi > > >> > > >> > > >> > Hi Ian, > > >> > I was in fact practicing some cql stuff in related to this response > > >> (with > > >> > cassandra cql terminal). This is quite a wonderful explanation for a > > new > > >> > comer like me. Thank you very much for the explanation again. Now it > > >> really > > >> > makes sense. > > >> > > > >> > > >> excellent! > > >> > > >> > > >> > > > >> > Other than the zero depth approach, I believe users will be more > > >> > comfortable with Example 2 approach. > > >> > Shall we go ahead with it ? > > >> > > > >> > > >> > > >> Yes, go for it. It will be interesting to see how hard it is to > > implement > > >> and how well (or not) it works. Remember, keep it as simple as > possible > > >> and > > >> dont try and and cover every use case at the expense of getting a PoC > > >> working. > > >> > > > +1. > > > > > >> > > >> However, dont forget, Unit tests mocked with Mockito are a quicker way > > of > > >> getting to working code, than no unit test coverage. > > >> > > >> Best Regards > > >> Ian > > >> > > >> > > >> > > >> > > >> > > > >> > > > >> > > Ian > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > On 25 June 2013 05:29, Dishara Wijewardana < > [email protected] > > > > > >> > > wrote: > > >> > > > > >> > > > On Mon, Jun 24, 2013 at 4:02 AM, Ian Boston <[email protected]> > > wrote: > > >> > > > > > >> > > > > Hi Dishara, > > >> > > > > Yes. 1 resource == 1 row. > > >> > > > > The columns within that row represent the properties of the > > >> resource. > > >> > > > > I suggest that you use standard property names where > appropriate > > >> (eg > > >> > > > > sling:resourceType is the Resource.resourceType etc) > > >> > > > > > > >> > > > > The Resource itself should be adaptable to a generic > > >> > CassandraResource > > >> > > > > (which will probably implement Resource) which will have a map > > of > > >> > > > > properties containing all the columns of the cassandra row. > > >> (optimise > > >> > > > > later) A CassandraResource might look and feel like a > > Map<String, > > >> > > Object> > > >> > > > > or it might have a Map<String, Object> getProperties() method, > > or > > >> > > better > > >> > > > > still be adaptable to a Map. The essential think is dont hard > > code > > >> > the > > >> > > > > property names in the interface of CassandraResource for the > > >> moment. > > >> > ie > > >> > > > no > > >> > > > > getContentType() and no getMimeType(), as we dont really know > > >> what a > > >> > > > > CassandraResource will store. > > >> > > > > > > >> > > > > ResourceMetadata should be built from a subset of the > > >> > CassandraResource > > >> > > > > properties. > > >> > > > > > > >> > > > > You won't need to implement a ResourceResolver, only a > > >> > ResourceProvider > > >> > > > > (and Factory). I would use CQL in preference to other API > > methods. > > >> > > > > > > >> > > > > There is one thing that hasnt been mentioned, and thats the > URL > > -> > > >> > > > > Cassandra Row mapping. > > >> > > > > There are several ways of doing this. > > >> > > > > > > >> > > > > eg: > > >> > > > > URL = /content/cassandra/<columnFamily>/<rowID> > > >> > > > > Cassandra Column Family = columnFamily > > >> > > > > Cassandra RowID = rowID > > >> > > > > or > > >> > > > > URL = > > >> /content/cassandra/<columnFamilySelector>/remainder/of/the/path > > >> > > > > Cassandra Cassandra Column Family = > > >> > > > > mapOfColumnFamilies.get(columnFamilySelector) > > >> > > > > Cassandra RowID = function(/remainder/of/the/path) > > >> > > > > > > >> > > > > or to take that one stage further > > >> > > > > > > >> > > > > public interface CassandraMapper { > > >> > > > > String getCQL(String columnFamilySelector, String path); > > >> > > > > } > > >> > > > > > > >> > > > Hi Ian > > >> > > > Thank you for the detailed explanation. > > >> > > > > > >> > > > OK. +1 for this approach with the mentioned flexibility.But I > > need > > >> a > > >> > > small > > >> > > > clarification. With this approach, > > >> > > > > > >> > > > URL = /content/cassandra/<columnFamilySelector>ROW-ID > > >> > > > ROW-ID - function(/remainder/of/the/path). > > >> > > > So you mean ROW-ID is something we have to programatically > > uniquely > > >> > > create > > >> > > > right ? like a UUID. > > >> > > > > > >> > > > What is this "/remainder/of/the/path" means ? Can you give an > > >> example > > >> > > with > > >> > > > real values in the context of a user who want to obtain a > resource > > >> from > > >> > > > cassandra. > > >> > > > This is just for my understanding. > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > >> > > > > URL = > > /content/cassandra/<columnFamilySelector>/<remainderOfPath> > > >> > > > > > > >> > > > > String cqlQuery = > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > mapOfCassandraMappers.get(columnFamilySelector).getCQL(columnFamilySelector, > > >> > > > > remainderOfPath); > > >> > > > > > > >> > > > > Which would allow us provided one or more implementations of > > >> > > > > CassandraMapper to map between URL and CQL. > > >> > > > > > > >> > > > > > > >> > > > > HTH > > >> > > > > Ian > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > On 23 June 2013 19:29, Dishara Wijewardana < > > >> [email protected]> > > >> > > > > wrote: > > >> > > > > > > >> > > > > > Hi Ian, > > >> > > > > > > > >> > > > > > What is the data mapping should be between Cassandra and > Sling > > >> > > > resource. > > >> > > > > I > > >> > > > > > mean is a Sling Resource maps to a Cassandra Column ? Or > > Column > > >> > > Family > > >> > > > ? > > >> > > > > > > > >> > > > > > Because to get this Cassandra and Sling story correct we > need > > to > > >> > > > finalize > > >> > > > > > this. > > >> > > > > > For an example what we eventually returns is a Sling > resource. > > >> > > > Everything > > >> > > > > > that needs to fill in to create Sling resource should be > > stored > > >> in > > >> > > > > > Cassandra. > > >> > > > > > In a Sling resource, > > >> > > > > > > > >> > > > > > - Path - direct sling resource path > > >> > > > > > - ResourceType - nt:cassandra > > >> > > > > > - ResourceSuperType - ? > > >> > > > > > - ResourceMetadata - we can create this on the fly with > the > > >> data > > >> > > > from > > >> > > > > > the corresponding column. At insertion, those need to be > > >> stored. > > >> > > > > > Following > > >> > > > > > are the ones which I thought might be useful by default > to > > be > > >> > set > > >> > > > for > > >> > > > > > any > > >> > > > > > node. Please add if we need anything more. > > >> > > > > > - ContentType > > >> > > > > > - ContentLength > > >> > > > > > - CreationTime > > >> > > > > > - ModificationTime > > >> > > > > > - ResourceResolver - Do we need a resolver in this case > ? > > >> > > > > > > > >> > > > > > > > >> > > > > > So I believe in CQL context, one ROW should represent a > Sling > > >> > > > resource. > > >> > > > > If > > >> > > > > > that is the case for ResourceMetadata we might need a > separate > > >> > column > > >> > > > to > > >> > > > > > store it since it has multiple values. I am not sure whether > > we > > >> can > > >> > > do > > >> > > > it > > >> > > > > > with CQL, but it should be possible with hector APIs may be. > > >> > > > > > > > >> > > > > > Appreciate your thoughts ? > > >> > > > > > > > >> > > > > > > > >> > > > > > On Wed, Jun 19, 2013 at 1:19 AM, Dishara Wijewardana < > > >> > > > > > [email protected]> wrote: > > >> > > > > > > > >> > > > > > > Hi Ian, > > >> > > > > > > I am starting this thread to keep track on things related > to > > >> the > > >> > > GSoC > > >> > > > > > > project related milestone status updates and related > > >> discussions. > > >> > > > > > > So the first task over view will be as follows as per GSoC > > >> > proposal > > >> > > > > > > provided. > > >> > > > > > > > > >> > > > > > > 1. Implementing a CassandraResourceProvider to READ from > > >> > > Cassandra. > > >> > > > > > > Implementation Details [1] > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > [1] : Implementation Details: > > >> > > > > > > > > >> > > > > > > 1.A) Write a CassanrdaResourceProviderUtil which is > > >> basically a > > >> > > > > > > cassendra client which will facilitate all cassandra > related > > >> > > > operations > > >> > > > > > > required by other modules (CassandraResourceProvider and > > >> > > > > > > CassandraResourceResolver). > > >> > > > > > > > > >> > > > > > > 1.B) Implementation of CassandraResourceProvider > > >> > > > > > > > > >> > > > > > > 1.C) Implementation of CassandraResourceResolver > > >> > > > > > > > > >> > > > > > > 1.D) Implementation of CassandraResource > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > And I will start writing the CassanrdaResourceProviderUtil > > >> class > > >> > > > which > > >> > > > > > > will do basic add and get using hector API. Please provide > > any > > >> > > > feedback > > >> > > > > > > that will be useful to accomplish this task. > > >> > > > > > > So for this how does path mapping should be done. Because > > for > > >> > > > example, > > >> > > > > > the > > >> > > > > > > path of the cassendra node will not be same as the jcr > node > > >> path. > > >> > > i.e > > >> > > > > > > provider will ask a node path /system/myapps/test/foo and > > >> where > > >> > > > should > > >> > > > > we > > >> > > > > > > return it from Cassandra. Aren't we have to first consider > > the > > >> > > WRITE > > >> > > > > > aspect > > >> > > > > > > to Cassandra ? > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > -- > > >> > > > > > > Thanks > > >> > > > > > > /Dishara > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > -- > > >> > > > > > Thanks > > >> > > > > > /Dishara > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > Thanks > > >> > > > /Dishara > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > Thanks > > >> > /Dishara > > >> > > > >> > > > > > > > > > > > > -- > > > Thanks > > > /Dishara > > > > > > > > > > > -- > > Thanks > > /Dishara > > > -- Thanks /Dishara
