Hello Ajay, Nice to see your interest in exploring Lens.
Looking at the requirement posted by you, seems you are only interested in REST api on top of cassandra. See the lens REST api for query service here - http://lens.incubator.apache.org/resource_QueryServiceResource.html Lens provides REST api for query actions (submit, fetch results, cancel, getstatus), query history and OLAP data model. If you are not interested in OLAP data model, you can simple run native queries on lens which would give REST api and history for you as well. For more details on OLAP data model, you can read the doc on website. For new driver to be rewritten, it need not know anything about OLAP data model. But all drivers are given HiveQL query to execute, if driver accepts Hive QL directly you dont have to do much on the query itself. If not, the driver has to translate the query into something driver can understand. For example : JDBCDriver converts HiveQL to SQL. - You can have look at JDBCDriver for converting into SQL. Hive QL is almost SQL, except it might have to map some UDFs to SQL udfs. ok. Coming back to your questions, answers are inline. On Fri, Jul 24, 2015 at 9:08 AM, Ajay <[email protected]> wrote: > Thanks. > > Couple of questions (that comes to my mind right now) > > 1) To create a Cluster instance to connect to a Cassandra cluster > (Cassandra Java driver), we need to know the following: > a) ip address and port no of few instances of the cluster > b) authentication (if enabled) > c) few more configuration (loading balancing policy, ret policy and > compression etc..) > > Lens drivers are following xml based configuration for passing information. See https://github.com/apache/incubator-lens/blob/master/lens-driver-hive/src/main/resources/hivedriver-default.xml and https://github.com/apache/incubator-lens/blob/master/lens-driver-jdbc/src/main/resources/jdbcdriver-default.xml, how driver specific configuration is put. Also read http://lens.incubator.apache.org/admin/config-server.html to understand configuration in lens. > Do we have any configuration support(yaml file or any other) in Lens. If > yes, any pointers (java file names or WIKI) > > 2) Cassandra cluser/session and prepared statements are multi thread safe > and recommended to be cached and reused. > > How does Lens support Caching (in memory or distributed)?. Any pointers > (java file names or WIKI) > > I did not understand the caching part here. But if you are talking about cassandra sessions to be used across user queries, you can look at how hive sessions and connections are used in HiveDriver - https://github.com/apache/incubator-lens/blob/master/lens-driver-hive/src/main/java/org/apache/lens/driver/hive/HiveDriver.java > > > On Thu, Jul 23, 2015 at 11:16 PM, Yash Sharma <[email protected]> wrote: > > > That is great. We can probably pick the most recent Java driver and > > Cassandra version then. Since it addresses couple of old issues. > > > > Regarding Spark SQL for querying Cassandra I would let other contributors > > suggest. > > > > On Thu, Jul 23, 2015 at 11:04 PM, Ajay <[email protected]> wrote: > > > > > Thanks Yash. > > > > > > In Between, ALLOW FILTERING is supported in Cassandra Java driver [1]. > > What > > > is the Cassandra and Java driver version we plan to support?. I have > > worked > > > on Cassandra 2.0.x and 2.1.x and Java driver 2.1.x and ALLOW FILTERING > > > worked. > > > > > > Secondly, as you mentioned I am aware of these limitation in CQL. But > > more > > > features recently added in Cassandra 2.2.x. Also, other option to work > > > around this to use Cassandra Spark connector and use Spark SQL. > > > > > > 1) > > > > > > > > > http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/querybuilder/Select.html > > > > > > > > > On Thu, Jul 23, 2015 at 10:07 PM, Yash Sharma <[email protected]> > wrote: > > > > > > > Just to add on... > > > > > > > > There will be few challenges to support entire SQL on Cassandra > > however. > > > > Last time I tried translating SQL to CQL I faced couple of scenarios > - > > > > > > > > 1. Filtering non indexed column in Cassandra > > > > 2. Filtering by subset of primary key > > > > 3. OR condition in where clause > > > > > > > > You can probably start without these being a blocker - In parallel we > > can > > > > discuss how this can be implemented on our stack. Others can provide > > > their > > > > opinions here. > > > > > > > > Hope its helpful. > > > > > > > > ----- > > > > Examples: > > > > Here 'trending_now' is a dummy table with (id, rank, pog_id) where > > > > (id,rank) is primary key pair. > > > > > > > > 1. > > > > cqlsh:recsys> select * from trending_now where pog_id=10004 ; > > > > Bad Request: No indexed columns present in by-columns clause with > Equal > > > > operator > > > > > > > > 2. > > > > cqlsh:recsys> select * from trending_now where rank=4; > > > > Bad Request: Cannot execute this query as it might involve data > > filtering > > > > and thus may have unpredictable performance. If you want to execute > > this > > > > query despite the performance unpredictability, use ALLOW FILTERING > > > > P.S. ALLOW FILTERING is not permitted in Cassandra java driver as of > > now. > > > > > > > > 3. > > > > cqlsh:recsys> select * from trending_now where rank=4 or id='id0004'; > > > > Bad Request: line 1:40 missing EOF at 'or' > > > > > > > > 4. Valid Query: > > > > cqlsh:recsys> select * from trending_now where id='id0004' and > rank=4; > > > > > > > > id | rank | pog_id > > > > --------+------+-------- > > > > id0004 | 4 | 10002 > > > > > > > > > > > > On Thu, Jul 23, 2015 at 9:45 PM, Yash Sharma <[email protected]> > > wrote: > > > > > > > > > The basic idea is to translate the Lens Query plan (Which is a Hive > > > AST) > > > > > to the DataStore specific Plan/Query. > > > > > > > > > > The best example I can think of right now is the Elastic Search > patch > > > by > > > > > Amruth. You can go through the JIRA [1] for the detailed discussion > > and > > > > the > > > > > Review Board [2] for code reference. > > > > > > > > > > Best regards > > > > > > > > > > 1. https://issues.apache.org/jira/browse/LENS-252 > > > > > 2. https://reviews.apache.org/r/36434 > > > > > > > > > > On Thu, Jul 23, 2015 at 9:30 PM, Ajay <[email protected]> > wrote: > > > > > > > > > >> Thanks Yash. > > > > >> > > > > >> Is there any documentation or WIKI on the Lens Driver. I am going > > > > through > > > > >> the code as well. > > > > >> > > > > >> Thanks > > > > >> Ajay > > > > >> > > > > >> > > > > >> On Thu, Jul 23, 2015 at 12:01 PM, Yash Sharma <[email protected]> > > > > wrote: > > > > >> > > > > >> > Hi Ajay, > > > > >> > Welcome to the Lens Dev. > > > > >> > We do have plans for a Cassandra driver [1] for Lens but no one > > has > > > > >> picked > > > > >> > it up yet. It would be great if you can pick up the task and > > submit > > > a > > > > >> patch > > > > >> > for review. > > > > >> > > > > > >> > Also drop a note on the list in case you stumble upon any issue. > > > > Someone > > > > >> > will always be around to help you out. > > > > >> > > > > > >> > > > > > >> > 1. https://issues.apache.org/jira/browse/LENS-654 > > > > >> > > > > > >> > On Thu, Jul 23, 2015 at 11:41 AM, Ajay <[email protected]> > > > wrote: > > > > >> > > > > > >> > > Hi, > > > > >> > > > > > > >> > > I recently noticed about Apache Lens project. Currently we are > > > > >> building > > > > >> > > REST APIs for Apache Cassandra (proprietary) as there no such > > > Apache > > > > >> open > > > > >> > > source project exists. Now as Lens supports REST APIs for > Hadoop > > > and > > > > >> > JDBC, > > > > >> > > I want to know is there any plan to support for Apache > Cassandra > > > as > > > > >> its > > > > >> > > support CQL and Spark SQL (thru Spark connector) which are > more > > > SQL > > > > >> like. > > > > >> > > If yes, I wish to know the details and contribute as well. > > > > >> > > > > > > >> > > Thanks > > > > >> > > Ajay > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > >
