Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill
We've used 'em all andŠ (IMHO) 1) I would avoid Thrift directly. 2) Hector is a sure bet. 3) Astyanax is the up and comer. 4) Kundera is good, but works like an ORM -- so not so good if your columns aren't defined ahead of time. -brian --- Brian O'Neill Lead Architect, Software Development

Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill
Thanks Dean… I hadn't played with that one. I wonder if that would better fit the bill for the Spring Data Cassandra module I'm hacking on. https://github.com/boneill42/spring-data-cassandra I'll poke around. -brian --- Brian O'Neill Lead Architect, Software Development Health Market

Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill
FWIW.. I just threw this together... http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html Let me know if I missed any others. (I didn't have playorm on there) -brian On Thu, Aug 23, 2012 at 9:51 AM, Brian O'Neill boneil...@gmail.com wrote: Thanks Dean… I hadn't played

Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill
Ha… how could I forget? =) Adding it now. --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive • King of Prussia, PA • 19406 M: 215.588.6024 • @boneill42 http://www.twitter.com/boneill42 • healthmarketscience.com

Re: Spring - cassandra

2012-08-29 Thread Brian O'Neill
You looking for the author of Spring Data Cassandra? https://github.com/boneill42/spring-data-cassandra If so, I guess that is me. =) -brian --- Brian O'Neill Lead Architect, Software Development Apache Cassandra MVP Health Market Science The Science of Better Results 2700 Horizon Drive

Re: Spring - cassandra

2012-08-30 Thread Brian O'Neill
to email me directly so we don't spam this list. (or setup a googlegroup just in case others want to contribute) -brian --- Brian O'Neill Lead Architect, Software Development Apache Cassandra MVP Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M

Re: Cassandra API Library.

2012-09-04 Thread Brian O'Neill
You got it. (done) -brian On Tue, Sep 4, 2012 at 7:08 AM, Filipe Gonçalves the.wa.syndr...@gmail.com wrote: @Brian: you can add the Cassandra::Simple Perl client http://fmgoncalves.github.com/p5-cassandra-simple/ 2012/8/27 Paolo Bernardi berna...@gmail.com On 08/23/2012 01:40 PM, Thomas

Compound Keys: Connecting the dots between CQL3 and Java APIs

2012-09-11 Thread Brian O'Neill
Our data architects (ex-Oracle DBA types) are jumping on the CQL3 bandwagon and creating schemas for us. That triggered me to write a quick article mapping the CQL3 schemas to how they are accessed via Java APIs (for our dev team). I hope others find this useful as well:

Re: Data Modeling - JSON vs Composite columns

2012-09-19 Thread Brian O'Neill
Roshni, We're going through the same debate right now. I believe native support for JSON (or collections) is on the docket for Cassandra. Here is a discussion we had a few months ago on the topic: http://comments.gmane.org/gmane.comp.db.cassandra.devel/5233 We presently store JSON, but we're

Re: Solr Use Cases

2012-09-19 Thread Brian O'Neill
Roshni, We're using SOLR to support ad hoc queries and fuzzy searches against unstructured data stored in Cassandra. Cassandra is great for storage and you can create data models and indexes that support your queries, provided you can anticipate those queries. When you can't anticipate the

Re: Using the commit log for external synchronization

2012-09-20 Thread Brian O'Neill
Along those lines... We sought to use triggers for external synchronization. If you read through this issue: https://issues.apache.org/jira/browse/CASSANDRA-1311 You'll see the idea of leveraging a commit log for synchronization, via triggers. We went ahead and implemented this concept in:

Re: Using the commit log for external synchronization

2012-09-21 Thread Brian O'Neill
@aaronmorton http://www.thelastpickle.com On 21/09/2012, at 11:51 AM, Brian O'Neill b...@alumni.brown.edu wrote: Along those lines... We sought to use triggers for external synchronization. If you read through this issue: https://issues.apache.org/jira/browse/CASSANDRA-1311 You'll see

Re: Kundera 2.1 released

2012-09-21 Thread Brian O'Neill
Well done, Vivek and team!! This release was much anticipated. I'll give this a test with Spring Data JPA when I return from vacation. thanks, -brian On Sep 21, 2012, at 9:15 PM, Vivek Mishra wrote: Hi All, We are happy to announce release of Kundera 2.0.7. Kundera is a JPA 2.0

Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
Dean, We have the same question... We have thousands of separate feeds of data as well (20,000+). To date, we've been using a CF per feed strategy, but as we scale this thing out to accommodate all of those feeds, we're trying to figure out if we're going to blow out the memory. The initial

Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
Its just a convenient way of prefixing: http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html -brian On Mon, Oct 1, 2012 at 4:22 PM, Ben Hood 0x6e6...@gmail.com wrote: Brian, On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill b...@alumni.brown.edu wrote: We haven't

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Without putting too much thought into it... Given the underlying architecture, I think you could/would have to write your own partitioner, which would partition based on the prefix/virtual keyspace. -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science

Re: 1000's of CF's. virtual CFs do NOT workŠ..map/reduce

2012-10-02 Thread Brian O'Neill
Dean, Great point. I hadn't considered that either. Per my other email, think we would need a custom partitioner for this? (a mix of OrderPreservingPartitioner and RandomPartitioner, OPP for the prefix) -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Agreed. Do we know yet what the overhead is for each column family? What is the limit? If you have a SINGLE keyspace w/ 2+ CF's, what happens? Anyone know? -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon

Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

2012-10-02 Thread Brian O'Neill
using Storm, let me know. We have an unreleased version of the bolt that you probably want to use. (we're waiting on Nathan/Storm to fix some classpath loading issues) RE: a customer virtual keyspace Partitioner, point well taken -brian --- Brian O'Neill Lead Architect, Software Development

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Exactly. --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42 € healthmarketscience.com This information transmitted

Re: Using compound primary key

2012-10-08 Thread Brian O'Neill
Hey Vivek, The same thing happened to me the other day. You may be missing a component in your compound key. See this thread: http://mail-archives.apache.org/mod_mbox/cassandra-dev/201210.mbox/%3ccajhhpg20rrcajqjdnf8sf7wnhblo6j+aofksgbxyxwcoocg...@mail.gmail.com%3E I also wrote a couple blogs

Keeping the record straight for Cassandra Benchmarks...

2012-10-25 Thread Brian O'Neill
People probably saw... http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/tech/2012/102212-nosql-263595.html To clarify things take a look at... http://brianoneill.blogspot.com/2012/10/solid-nosql-benchmarks-from-ycsb-w-side.html -brian -- Brian ONeill Lead Architect, Health

Re: logging servers? any interesting in one for cassandra?

2012-11-06 Thread Brian O'Neill
the next few months. -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42 € healthmarketscience.com This information transmitted

Re: logging servers? any interesting in one for cassandra?

2012-11-07 Thread Brian O'Neill
Thanks Dean. We'll definitely take a look. (probably in January) -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive • King of Prussia, PA • 19406 M: 215.588.6024 • @boneill42 http://www.twitter.com/boneill42

Indexing Data in Cassandra with Elastic Search

2012-11-08 Thread Brian O'Neill
For those looking to index data in Cassandra with Elastic Search, here is what we decided to do: http://brianoneill.blogspot.com/2012/11/big-data-quadfecta-cassandra-storm.html -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024

Re: [BETA RELEASE] Apache Cassandra 1.2.0-beta2 released

2012-11-10 Thread Brian O'Neill
Wow...good catch. We had puppet scripts which automatically assigned the proper tokens given the cluster size. What is the range now? Got a link? -brian On Nov 10, 2012, at 9:27 PM, Edward Capriolo wrote: just a note for all. The default partitioner is no longer randompartitioner. It is

Re: Datatype Conversion in CQL-Client?

2012-11-18 Thread Brian O'Neill
If you are talking about the CQL-client that comes with Cassandra (cqlsh), it is actually written in Python: https://github.com/apache/cassandra/blob/trunk/bin/cqlsh For information on datatypes (and conversion) take a look at the CQL definition:

Re: Datatype Conversion in CQL-Client?

2012-11-19 Thread Brian O'Neill
I don't think Michael and/or Jonathan have published the CQL java driver yet. (CCing them) Hopefully they'll find a public home for it soon, I hope to include it in the Webinar in December. (http://www.datastax.com/resources/webinars/collegecredit) -brian --- Brian O'Neill Lead Architect

Re: Datatype Conversion in CQL-Client?

2012-11-19 Thread Brian O'Neill
that metadata to the result set in: https://github.com/apache/cassandra/blob/trunk/interface/thrift/gen-java/org /apache/cassandra/thrift/CqlResult.java -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King

Re: Datastax Java Driver

2012-11-19 Thread Brian O'Neill
Woohoo! Thanks for making this available. --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42 € healthmarketscience.com

Re: Datatype Conversion in CQL-Client?

2012-11-19 Thread Brian O'Neill
-brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42 € healthmarketscience.com This information transmitted in this email

Datastax C*ollege Credit Webinar Series : Create your first Java App w/ Cassandra

2012-12-12 Thread Brian O'Neill
FWIW -- I'm presenting tomorrow for the Datastax C*ollege Credit Webinar Series: http://brianoneill.blogspot.com/2012/12/presenting-for-datastax-college-credit.html I hope to make CQL part of the presentation and show how it integrates with the Java APIs. If you are interested, drop in. -brian

Re: Best Java Driver for Cassandra?

2012-12-13 Thread Brian O'Neill
available afterwards. I also have a laundry list here: (written before I knew about Firebrand) http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon

Re: Astyanax

2013-01-08 Thread Brian O'Neill
-frist-java-application -w.html -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42 € healthmarketscience.com

Re: Cassandra 1.2 Thrift and CQL 3 issue

2013-01-12 Thread Brian O'Neill
I reported the issue here. You may be missing a component in your column name. https://issues.apache.org/jira/browse/CASSANDRA-5138 -brian On Jan 12, 2013, at 12:48 PM, Shahryar Sedghi wrote: Hi I am trying to test my application that runs with JDBC, CQL 3 with Cassandra 1.2. After

Webinar: Using Storm for Distributed Processing on Cassandra

2013-01-16 Thread Brian O'Neill
Just an FYI -- We will be hosting a webinar tomorrow demonstrating the use of Storm as a distributed processing layer on top of Cassandra. I'll be tag teaming with Taylor Goetz, the original author of storm-cassandra. http://www.datastax.com/resources/webinars/collegecredit It is part of the

Re: Accessing Metadata of Column Familes

2013-01-28 Thread Brian O'Neill
Through CQL, you see the logical schema. Through CLI, you see the physical schema. This may help: http://www.datastax.com/dev/blog/cql3-for-cassandra-experts -brian On Mon, Jan 28, 2013 at 7:26 AM, Rishabh Agrawal rishabh.agra...@impetus.co.in wrote: I found following issues while working on

Re: cql: show tables in a keystone

2013-01-28 Thread Brian O'Neill
cqlsh use keyspace; cqlsh:cirrus describe tables; For more info: cqlsh help describe -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 http

Re: Netflix/Astynax Client for Cassandra

2013-02-07 Thread Brian O'Neill
Incidentally, we run Astyanax against 1.2.1. We haven't had any issues. When running against 1.2.0, we ran into this: https://github.com/Netflix/astyanax/issues/191 -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon

Re: any other NYC* attendees find your usb stick of the proceedings empty?

2013-03-25 Thread Brian O'Neill
/edwardcapriolo/intravert -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42 € healthmarketscience.com This information transmitted

BI/Analtyics/Warehousing for data in C*

2013-04-01 Thread Brian O'Neill
We are trudging through an options analysis for BI/DW solutions for data stored in C*. I'd love to hear people's experiences. Here is what we've found so far: http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html Maybe we just use Intravert with a custom handler to

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread Brian O'Neill
changing to user@ (at least until we can determine if this can/should be proposed under 1472) For those interested in analytics and set-based queries, see below... -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Great! Thanks Gabriel. Do you have an example? (are using QueryBuilder?) I couldn't find the part of the API that allowed you to pass in the byte array. -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
); sb.append(ByteBufferUtil.bytesToHex((ByteBuffer)value)); } Hopefully, the prepared statement doesn't do the conversion. (I'm not sure if it is a limitation of the CQL protocol itself) thanks again, -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
(); LOG.error(repository.get() [ + key + ] byte.length()=[ + data.length + ]); return data; } --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive • King of Prussia, PA • 19406 M: 215.588.6024 • @boneill42 http

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Sylvain, Interesting, when I look at the actual bytes returned, I see the byte array is prefixed with the keyspace and table name. I assume I'm doing something wrong in the select. Am I incorrectly using the ResultSet? -brian On Thu, Apr 11, 2013 at 9:09 AM, Brian O'Neill b

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
the bb.remaining() bytes starting at bb.arrayOffset() + bb.position() (where bb is the returned ByteBuffer). -- Sylvain -brian On Thu, Apr 11, 2013 at 9:09 AM, Brian O'Neill b...@alumni.brown.eduwrote: Yep, it worked like a charm. (PreparedStatement avoided the hex conversion

Re: Using elasticsearch on cassandra nodes

2011-10-18 Thread Brian O'Neill
Anthony, We've been looking at elastic search as well. Presently we have SOLR in place, but it is cumbersome dealing with SOLR schemas when indexing information out of Cassandra (since you can't anticipate all the columns ahead of time). What are you using as your bridge between Cassandra and

Re: Using elasticsearch on cassandra nodes

2011-10-19 Thread Brian O'Neill
, -brian Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ From: Anthony Ikeda anthony.ikeda@gmail.com Reply

Re: Using elasticsearch on cassandra nodes

2011-10-21 Thread Brian O'Neill
/conf/schema.xml#L538 But you may still want to define a schema so you can adjust the index and query time processing/typing of the field values. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/10/2011, at 2:20 AM, Brian O'Neill

REST API on the Server Side

2011-10-25 Thread Brian O'Neill
Sasha, Thanks for the feedback. I can appreciate your comment on connection pooling, and it is certainly a matter of taste/purpose/perspective. In our case, it helps to have the REST layer because its a more natural fit into our platform/ecosystem (considering we use COTS ETL tools, workflows,

Value-Added Services Layer

2011-10-25 Thread Brian O'Neill
Sasha, Thinking a little more about what problem the REST API solves... To be honest, I agree completely. I don't think a REST layer that provides the same feature/function as CQL is all that valuable except in cases like I described (which may not be all that common). Also, to be honest, I

R on Cassandra

2011-11-01 Thread Brian O'Neill
I saw a mention of R on Cassandra: http://comments.gmane.org/gmane.comp.db.cassandra.user/5681 Does anyone know if this has traction somewhere? -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog:

Re: Tool for SQL - Cassandra data movement

2011-11-02 Thread Brian O'Neill
the data directly into Cassandra. (using a ColumnFamilyOutput format) We are solving this problem right now, so I'll report back. -brian Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024 blog: http

Cassandra Integration w/ SOLR using Virgil

2011-11-04 Thread Brian O'Neill
Up front, I'd like to say this is still pretty raw. We'd love to get feedback (and better yet contributions ;). With that as a disclaimer, I added SOLR integration to Virgil. When you add and delete rows and columns via the REST interface, an index is updated in SOLR. For more information check

Re: Second Cassandra users survey

2011-11-07 Thread Brian O'Neill
It should be dead-simple to build a slick GUI on the REST layer. (@Virgilhttp://code.google.com/a/apache-extras.org/p/virgil/ ) I had planned to crank one out this week (using ExtJS) that mimicked the Squirrel/Toad look and feel. The UI would have a tree-panel of keyspaces and column families on

Re: security

2011-11-09 Thread Brian O'Neill
Not sure this is the standard approach, probably more what we came up with. ;) We plan to deploy Cassandra behind a firewall denying all traffic on all ports other than 8080. Access from applications will be limited to the REST/HTTP layer, which we'll lock down with standard HTTP authentication

GUI for Cassandra now included in Virgil

2011-11-21 Thread Brian O'Neill
I got around to implementing a GUI for Cassandra in Virgil. It was really simple. (100 lines of javascript) You can see a screenshot here: http://code.google.com/a/apache-extras.org/p/virgil/wiki/gui For those that were looking for a way to embed data visualization into their applications, you

Added run-modes to Virgil: Run embedded or against a remote Cassandra.

2011-11-28 Thread Brian O'Neill
I'm not sure if this was preventing anyone from using Virgil, but we added run-modes to Virgil to accomodate users that have an existing cluster. Now, you can just point Virgil at the remote instance and use it only for the GUI/REST layer and SOLR integration.

MapReduce on Cassandra using Ruby and REST!

2011-12-01 Thread Brian O'Neill
I know I've been spamming the list a bit with new features for Virgil, but this one is actually really cool... Enamored with what Riak provides as far as map/reduce via HTTP, http://wiki.basho.com/MapReduce.html#MapReduce-via-the-HTTP-API We implemented the same thing for Virgil/Cassandra.

Presentations from NYC?

2011-12-09 Thread Brian O'Neill
I may have missed it... Were the presentations posted from NYC? (Specifically, I'm looking for Nate's McCall's presentation) -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog:

Re: Using Cassandra in Rails App

2011-12-15 Thread Brian O'Neill
I'm not sure this is the best answer, but all of our webapps (RoR included) access Cassandra via REST. That is one of the major reasons we built Virgil. http://code.google.com/a/apache-extras.org/p/virgil/ It allows us to build the webapps, for the most part, independent of the actual storage

Re: cassandra data to hadoop.

2011-12-23 Thread Brian O'Neill
I'm not sure this is much help, but we actually run Hadoop jobs to load and extract data to and from HDFS. You can use ColumnFamilyInputFormat to race over the data in Cassandra and output it to a file. That doesn't solve the continuous problem, but should give you a batch mechanism to refresh

Re: Presentations from NYC?

2011-12-27 Thread Brian O'Neill
metrics. 2011/12/10 Jonathan Ellis jbel...@gmail.com Not yet -- we're working on it. On Fri, Dec 9, 2011 at 1:48 PM, Brian O'Neill b...@alumni.brown.edu wrote: I may have missed it... Were the presentations posted from NYC? (Specifically, I'm looking for Nate's McCall's presentation

Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Brian O'Neill
Kevin, I just pulled the code and read through the design. Great stuff. Any thought to potentially using this for real-time processing as well? Right now, we have a set of Hadoop M/R jobs that operate against Cassandra for ETL. We were looking at using Storm for the real-time processing

Re: Hector and CQL

2012-01-05 Thread Brian O'Neill
If you are looking to add hector, you'll need: dependency groupIdme.prettyprint/groupId artifactIdhector/artifactId version1.0-2/version /dependency -brian Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p

Copy a column family?

2012-01-09 Thread Brian O'Neill
What is the fastest way to copy a column family? We were headed down the map/reduce path, but that seems silly. Any file level mechanisms for this? -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog:

Re: Copy a column family?

2012-01-09 Thread Brian O'Neill
Excellent. We'll give it a try. Thanks Brandon. -brian Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ On 1

Cassandra to Oracle?

2012-01-20 Thread Brian O'Neill
I can't remember if I asked this question before, but We're using Cassandra as our transactional system, and building up quite a library of map/reduce jobs that perform data quality analysis, statistics, etc. ( 100 jobs now) But... we are still struggling to provide an ad-hoc query mechanism

Re: Cassandra to Oracle?

2012-01-20 Thread Brian O'Neill
benchmark would be helpful) -brian On Fri, Jan 20, 2012 at 12:41 PM, Zach Richardson j.zach.richard...@gmail.com wrote: How much data do you think you will need ad hoc query ability for? On Fri, Jan 20, 2012 at 11:28 AM, Brian O'Neill b...@alumni.brown.eduwrote: I can't remember if I asked

Ad Hoc Queries

2012-01-20 Thread Brian O'Neill
? (even a simple count / or copy CF benchmark would be helpful) -brian On Fri, Jan 20, 2012 at 12:41 PM, Zach Richardson j.zach.richard...@gmail.com wrote: How much data do you think you will need ad hoc query ability for? On Fri, Jan 20, 2012 at 11:28 AM, Brian O'Neill b

Triggers?

2012-01-20 Thread Brian O'Neill
Anyone know if there is any activity to deliver triggers? I saw this quote: http://www.readwriteweb.com/cloud/2011/10/cassandra-reaches-10-whats-nex.php Ellis says that he's just starting to think about the post-1.0 world for Cassandra. Two features do come to mind, though, that missed the boat

Re: Cassandra to Oracle?

2012-01-22 Thread Brian O'Neill
call ad-hoc queries. Regards, Maxim On 1/20/2012 9:28 AM, Brian O'Neill wrote: I can't remember if I asked this question before, but We're using Cassandra as our transactional system, and building up quite a library of map/reduce jobs that perform data quality analysis

Re: Cassandra to Oracle?

2012-01-22 Thread Brian O'Neill
RDBMS which doesn't scale very well for what you call ad-hoc queries. Regards, Maxim On 1/20/2012 9:28 AM, Brian O'Neill wrote: I can't remember if I asked this question before, but We're using Cassandra as our transactional system, and building up quite a library of map

Re: Cassandra to Oracle?

2012-01-22 Thread Brian O'Neill
Good point Milind. (RE: Client-side AOP) I was thinking server-side to stay with the trigger concept, but we could just as easily intercept on the client-side. We'd just need to make sure that all clients got the AOP code injected. (including all of our map/reduce jobs) If we get the

Remote Hadoop Job Deployment

2012-01-24 Thread Brian O'Neill
FYI... we finally got around to releasing a version of Virgil that includes the ability to deploy jobs to remote Hadoop clusters running against Cassandra Column Families. http://brianoneill.blogspot.com/2012/01/virgil-remote-hadoop-job-deployment-via.html This has enabled an army of people to

Virgil Moved (and Cassandra-Triggers coming soon)

2012-02-07 Thread Brian O'Neill
FYI -- we moved Virgil to Github to make it easier for people to contribute. https://github.com/hmsonline/virgil Also, we created an organization profile (hmsonline) to house all of our storm/cassandra related work. https://github.com/hmsonline Under that profile, we'll be releasing

Cassandra Triggers Capability published out to GitHub

2012-03-02 Thread Brian O'Neill
FYI -- http://brianoneill.blogspot.com/2012/03/cassandra-triggers-for-indexing-and.html https://github.com/hmsonline/cassandra-triggers Feedback welcome. Contribution and involvement is even better. ;) -brian -- Brian ONeill Lead Architect, Health Market Science

Re: cassandra gui

2012-04-01 Thread Brian O'Neill
If you give Virgil a try, let me know how it goes. The REST layer is pretty solid, but the gui is just a PoC which makes it easy to see what's in the CFs during development/testing. (It's only a couple hundred lines of ExtJS code built on the REST layer) We had plans to add CQL to the gui for

Re: Server Side Logic/Script - Triggers / StoreProc

2012-04-22 Thread Brian O'Neill
Praveen, We are certainly interested. To get things moving we implemented an add-on for Cassandra to demonstrate the viability (using AOP): https://github.com/hmsonline/cassandra-triggers Right now the implementation executes triggers asynchronously, allowing you to implement a java interface

Indexing JSON in Cassandra

2012-06-21 Thread Brian O'Neill
I know we had this conversation over on the dev list a while back: http://www.mail-archive.com/dev@cassandra.apache.org/msg03914.html I just wanted to let people know that we added the capability to our cassandra-indexing extension.

Re: Ball is rolling on High Performance Cassandra Cookbook second edition

2012-06-27 Thread Brian O'Neill
RE: API method signatures changing That triggers another thought... What terminology will you use in the book to describe the data model? CQL? When we wrote the RefCard on DZonehttp://refcardz.dzone.com/refcardz/apache-cassandra, we intentionally favored/used CQL terminology. On advisement

Re: which high level Java client

2012-06-28 Thread Brian O'Neill
FWIW, We keep most of our system level integrations behind REST using Virgil: https://github.com/hmsonline/virgil When a lower-level integration is necessary we use Hector, but recently we've started using Astyanax and plan to port our Hector dependencies over to Astyanax when given a chance.

Re: Cassandra and Tableau

2012-07-06 Thread Brian O'Neill
Robin, We have the same issue right now. We use Tableau for all of our reporting needs, but we couldn't find any acceptable bridge between it and Cassandra. We ended up using cassandra-triggers to replicate the data to Oracle. https://github.com/hmsonline/cassandra-triggers/ Let us know if you

Re: Trigger and customized filter

2012-07-10 Thread Brian O'Neill
While Jonathan and crew work on the infrastructure to support triggers: https://issues.apache.org/jira/browse/CASSANDRA-4285 We have a project going over here that provides a trigger-like capability: https://github.com/hmsonline/cassandra-triggers/

An experiment using Spring Data w/ Cassandra (initially via JPA/Kundera)

2012-07-18 Thread Brian O'Neill
This is just an FYI. I experimented w/ Spring Data JPA w/ Cassandra leveraging Kundera. It sort of worked: https://github.com/boneill42/spring-data-jpa-cassandra http://brianoneill.blogspot.com/2012/07/spring-data-w-cassandra-using-jpa.html I'm now working on a pure Spring Data adapter using

Re: How to manually build and maintain secondary indexes

2012-07-26 Thread Brian O'Neill
. It doesn't address all of your concerns, but I tried to capture the motivation behind our implementation here: http://brianoneill.blogspot.com/2012/03/cassandra-indexing-good-bad-and-ugl y.html -brian -- Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King

Re: How to process new rows in parallel?

2012-08-03 Thread Brian O'Neill
If you are deleting the messages after processing, it sounds like you are using Cassandra as a work queue. Here are some links for implementing a distributed queue in Cassandra: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Distributed-work-queues-td5226248.html

A Big Data Trifecta: Storm, Kafka and Cassandra

2012-08-04 Thread Brian O'Neill
Philip, I figured I would reply via blog post. =) http://brianoneill.blogspot.com/2012/08/a-big-data-trifecta-storm-kafka-and.html That blog post shows how we pieced together Kafka and Cassandra (via Storm). With LinkedIn behind Kafka, it is well supported. They use it in production. (and most

Re: Exporting all data within a keyspace

2013-04-30 Thread Brian O'Neill
You could always do something like this as well: http://brianoneill.blogspot.com/2012/05/dumping-data-from-cassandra-like.htm l -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M

Re: multitenant support with key spaces

2013-05-06 Thread Brian O'Neill
You may want to look at using virtual keyspaces: http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html And follow these tickets: http://wiki.apache.org/cassandra/MultiTenant -brian On May 6, 2013, at 2:37 AM, Darren Smythe wrote: How many keyspaces can you

[BLOG] : Cassandra as a Deep Storage Mechanism for Druid Real-Time Analytics Engine

2013-05-17 Thread Brian O'Neill
FWIW, we were able to integrate Druid and Cassandra. Its only in PoC right now, but it seems like a powerful combination: http://brianoneill.blogspot.com/2013/05/cassandra-as-deep-storage-mechanism-for.html -brian -- Brian ONeill Lead Architect, Health Market Science

SQL Injection C* (via CQL Thrift)

2013-06-18 Thread Brian O'Neill
Mostly for fun, I wanted to throw this out there... We are undergoing a security audit for our platform (C* + Elastic Search + Storm). One component of that audit is susceptibility to SQL injection. I was wondering if anyone has attempted to construct a SQL injection attack against Cassandra?

Re: SQL Injection C* (via CQL Thrift)

2013-06-18 Thread Brian O'Neill
% confident making that assertion. -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42 € healthmarketscience.com

Re: Main method not found in class org.apache.cassandra.service.CassandraDaemon

2013-07-17 Thread Brian O'Neill
Vivek, The location of CassandraDaemon changed between versions. (from org.apache.cassandra.thrift to org.apache.cassandra.service) It is likely that the start scripts are picking up the old version on the classpath, which results in the main method not being found. Do you have CASSANDRA_HOME

Re: Main method not found in class org.apache.cassandra.service.CassandraDaemon

2013-07-17 Thread Brian O'Neill
Vivek, You could try echoing the CLASSPATH to double check. Drop an echo into the launch_service function in the cassandra shell script. (~line 121) Let us know the output. -brian --- Brian O'Neill Chief Architect Health Market Science The Science of Better Results 2700 Horizon Drive € King

Drop keyspace via CQL hanging on master/trunk.

2013-12-05 Thread Brian O'Neill
describe keyspaces; system test_keyspace system_traces cqlsh drop keyspace test_keyspace; THIS HANGS INDEFINITELY thoughts? user error? worth filing an issue? One other note ‹ this happens using the CQL java driver as well. -brian --- Brian O'Neill Chief Architect Health Market Science

Re: Drop keyspace via CQL hanging on master/trunk.

2013-12-05 Thread Brian O'Neill
I removed the data directory just to make sure I had a clean environment. (eliminating the possibility of corrupt keyspaces/files causing problems) -brian --- Brian O'Neill Chief Architect Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M

Re: Drop keyspace via CQL hanging on master/trunk.

2013-12-10 Thread Brian O'Neill
Great. Thanks Aaron. FWIW, I am/was porting Virgil over CQL. I should be able to release a new REST API for C* (using CQL) shortly. -brian --- Brian O'Neill Chief Architect Health Market Science The Science of Better Results 2700 Horizon Drive • King of Prussia, PA • 19406 M: 215.588.6024

Dimensional SUM, COUNT, DISTINCT in C* (replacing Acunu)

2013-12-17 Thread Brian O'Neill
We are seeking to replace Acunu in our technology stack / platform. It is the only component in our stack that is not open source. In preparation, over the last few weeks I’ve migrated Virgil to CQL. The vision is that Virgil could receive a REST request to upsert/delete data (hierarchical

  1   2   >