With "very selective" I intended to refer to the columns, not the rows. That is, if your query only careas about 3 columns out of 100, then a true columnar layout works great.
On Sun, Jan 20, 2013 at 10:07 PM, Tomer Shiran <[email protected]> wrote: > Drill is being developed with the flexibility to support different data > sources, so Cassandra support should not be a problem. Is that something > you would be interested in building? > > The performance depends on the query. A query that involves a range scan > would be very slow (assuming the default partitioner in Cassandra, > RandomPartitioner), but point queries and queries that involve full table > scans would provide reasonable performance. A full columnar layout would be > faster for some queries (eg, queries that are very selective). > > BTW, Drill will support nested data, so JSON is not an issue. > > > On Sun, Jan 20, 2013 at 8:37 PM, Brian O'Neill <[email protected]>wrote: > >> Last week, Brad Anderson came up and presented at the PhillyDB meetup. >> http://www.slideshare.net/boorad/phillydb-talk-beyond-batch >> >> He gave us an overview of Drill, and I'm curious... >> >> Presently, we heavily use Storm + Cassandra. >> >> http://brianoneill.blogspot.com/2012/08/a-big-data-trifecta-storm-kafka-and.html >> >> We treat CRUD operations as events. Then within Storm we calculate >> aggregate counts of entities flowing through the system by various >> dimensions. That works well, but we still need an ad hoc reporting >> capability, and a way to report on data in the system that is not >> active (historical). >> >> Would it be possible to use the Drill engine against a Cassandra backend? >> If so, what does that mean? (implementing some API?) >> >> I assume that performance would be terrible unless somehow the data is >> stored using the columnar data format from the Dremel paper. Is that >> accurate? Does anyone know if anyone has attempted a translation of >> that format to Cassandra? >> >> Regardless, I'm very interested in getting involved and no stranger to >> getting my hands dirty. >> Let me know if you can provide any direction. (our entities are >> currently stored in JSON in Cassandra) >> >> -brian >> >> >> -- >> Brian ONeill >> Lead Architect, Health Market Science (http://healthmarketscience.com) >> mobile:215.588.6024 >> blog: http://brianoneill.blogspot.com/ >> twitter: @boneill42 >> > > > > -- > Tomer Shiran > Director of Product Management | MapR Technologies | 650-804-8657 > -- Tomer Shiran Director of Product Management | MapR Technologies | 650-804-8657
