Re: Cluster per Application vs. Multi-Application Clusters

2012-08-22 Thread Hiller, Dean
Just an opinion here as we are having to do this ourselves loading tons of researchers datasets into one clusters. We are going the path of one keyspace as it makes it easier if you ever want to mine the data so you don't have to keep building different clients for another keyspace. We ended

Re: Cluster per Application vs. Multi-Application Clusters

2012-08-22 Thread Hiller, Dean
, Hiller, Dean dean.hil...@nrel.gov wrote: Just an opinion here as we are having to do this ourselves loading tons of researchers datasets into one clusters. We are going the path of one keyspace as it makes it easier if you ever want to mine the data so you don't have to keep building different

Re: Cassandra API Library.

2012-08-23 Thread Hiller, Dean
playOrm has a raw layer that if your columns are not defined ahead of time and SQL with no limitations on , =, =, etc. etc. as well as joins being added shortly BUT joins are for joining partitions so that your system can still scale to infinity. Also has an in-memory database as well for unit

new type of join just discovered on cassandra

2012-08-23 Thread Hiller, Dean
With playOrm we have been researching partitioning and joining partitions for OLTP applications which you typically partition per client anyways such that you can have infinite clients. Naturally, we have been looking at a lot of nested loop join, block nested loop join, sort merge join, and

Re: Cassandra API Library.

2012-08-23 Thread Hiller, Dean
action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 8/23/12 9:19 AM, Hiller, Dean dean.hil...@nrel.gov wrote: playOrm has a raw layer that if your columns are not defined ahead of time and SQL with no limitations

can you use hostnames in the topology file?

2012-08-27 Thread Hiller, Dean
In the example, I see all ips being used, but our machines are on dhcp so I would prefer using hostnames for everything(plus if a machine goes down, I can bring it back online on another machine with a different ip but same hostname). If I use hostname, does the listen_address have to be

Re: JMX(RMI) dynamic port allocation problem still exists?

2012-08-27 Thread Hiller, Dean
In cassandra-env.sh, search on JMX_PORT and it is set to 7199 (ie. Fixed) so that solves your issue, correct? Dean From: Yang tedd...@gmail.commailto:tedd...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

Re: Cassandra 1.1.4 RPM required

2012-08-28 Thread Hiller, Dean
You are probably inside a company and the company has a proxy which is doing basic auth is my guess…try your company username /password or do it from home. Dean From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To:

keyspace and column family creationŠhow to use ConsistencyLevel.ALL with creation?

2012-08-29 Thread Hiller, Dean
The playOrm test suite drops the keyspace and recreates it for tests to wipe out the in-memory or cassandra db. Today, we successfully ran our test suite on a 6 node cluster. The one issue I had though was I needed to sleep after keyspace creation and column family creation. BEFORE that I

Re: Why Cassandra secondary indexes are so slow on just 350k rows?

2012-08-30 Thread Hiller, Dean
It seems to me you may want to revisit the design(but not 100% sure as I am not sure I understand the entire context) a bit as I could see having partitions and a few clients that poll in each partition so you can scale to infinity basically with no issues. If you are doing all this polling

Re: Cassandra and Apache Drill

2012-09-04 Thread Hiller, Dean
Many queries on small portion of the data….sounds like playORM ;). As long as you partition your data with playOrm, you can do really fast queries into that data by partition using Scalabla SQL (SQL with the addition of a partition clause in front as to what partitions you are querying). Joins

anyone know how to lookup non-continguous columns BUT for prefixes?

2012-09-04 Thread Hiller, Dean
I have a row that is an index like so Index row - value1.pk99, value1.pk20, value2.pk32, value2.pk7 , value3.pk24, value4.pk54, value5.pk31 I would like to get all of the pks for value2 which are pk32 and pk7 And value4 which are pk54 This is a trimmed down example of course. I am

Re: are asynchronous schema updates possible ?

2012-09-04 Thread Hiller, Dean
+1 What kinds of problems? Thanks, Dean From: Илья Шипицин chipits...@gmail.commailto:chipits...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, September 4, 2012 1:12 PM To:

playOrm now supports N-level joins on cassandra (no limitations on where clause)

2012-09-04 Thread Hiller, Dean
There is no = or limitations. Joins are in beta and currently can only do inner joins at this time….Also, queries return a Cursor so you can page as well and keep the cursor in a web server session if needed for paging. It also looks like joins may be faster with cassandra/playOrm vs.

playOrm S-SQL comparison with CQL

2012-09-06 Thread Hiller, Dean
Someone asked so I wrote the difference here https://github.com/deanhiller/playorm/wiki/Fast-Scalable-Queries playOrm queries are geared for a different problem then CQL is geared for. Summary is basically playOrm uses significantly less resources as it only queries the partitions it is

cassandra performance looking great...

2012-09-07 Thread Hiller, Dean
So we wrote 1,000,000 rows into cassandra and ran a simple S-SQL(Scalable SQL) query of PARTITIONS n(:partition) SELECT n FROM TABLE as n WHERE n.numShares = :low and n.pricePerShare = :price It ran in 60ms So basically playOrm is going to support millions of rows per partition. This is

Re: cassandra performance looking great...

2012-09-07 Thread Hiller, Dean
...@gmail.com wrote: Try to get Cassandra running the TPH-C benchmarks and beat oracle :) On Fri, Sep 7, 2012 at 10:01 AM, Hiller, Dean dean.hil...@nrel.gov wrote: So we wrote 1,000,000 rows into cassandra and ran a simple S-SQL(Scalable SQL) query of PARTITIONS n(:partition) SELECT n FROM TABLE as n

any way to prefer just 3 column families for partial row caching

2012-09-10 Thread Hiller, Dean
We have 3 tables for all indexing we do called IntegerIndexing DecimalIndexing StringIndexing playOrm would prefer that only these rows are cached as every row in those tables are indices. Customers/Clients of playOrm tend to always hit the same index rows over and over as they are using the

thoughts on this feature request

2012-09-12 Thread Hiller, Dean
Using wide rows for indexing is extremely common. I was wondering if we could get some type of command like so for index rows Remove value1.pkX AND Add value2.pkX such that if value1.pkX is NOT found, the whole row will be scanned for ANYvalue.pkX and remove that value instead. This would

Re: Data Model

2012-09-14 Thread Hiller, Dean
playOrm uses EXACTLY that pattern where @OneToMany becomes student.rowkeyStudent1 student.rowkeyStudent2 and the other fields are fixed. It is a common pattern in noSQL. Dean From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To:

Re: Composite Column Query Modeling

2012-09-14 Thread Hiller, Dean
There is another trick here. On the playOrm open source project, we need to do a sparse query for a join and so we send out 100 async requests and cache up the java Future objects and return the first needed result back without waiting for the others. With the S-SQLin playOrm, we have the IN

Re: Astyanax - build

2012-09-14 Thread Hiller, Dean
I didn't need to compile it. It is up in the maven repositories as we http://mvnrepository.com/artifact/com.netflix.astyanax/astyanax Or are you trying to see how it works? (We use the same client on playORM open source projectŠit works like a charm). Dean On 9/14/12 10:28 AM, A J

Re: Is Cassandra right for me?

2012-09-18 Thread Hiller, Dean
I wanted to clarify the where that statement comes from on wide rows …. Realize some people make the claim that if you don’t' have 1000's of columns in some rows in cassandra you are doing something wrong. This is not true, BUT it comes from the fact that people are setting up indexes. This

Re: Is Cassandra right for me?

2012-09-18 Thread Hiller, Dean
Until Aaron replies, here are my thoughts on the relational piece… If everything in my model fits into a relational database, if my data is structured, would it still be a good idea to use Cassandra? Why? The playOrm project explores exactly this issue……A query on 1,000,000 rows in a

Re: Is Cassandra right for me?

2012-09-18 Thread Hiller, Dean
Cassandra is fully aware of all tables created with playOrm and you can still use DataStax enterprise features to get real time analytics. Playroom is a layer on top of cassandra and with any layer it makes a developer more productive at a slight cost of performance just like hibernate on top

Re: Is Cassandra right for me?

2012-09-18 Thread Hiller, Dean
be aware of the tables I create with playOrm, just of the column families this framework uses to store the data, right? 2012/9/18 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov Until Aaron replies, here are my thoughts on the relational piece… If everything in my model fits

Re: Data Model - Consistency question

2012-09-19 Thread Hiller, Dean
Yes, this scenario can occur(even with quorum writes/reads as you are dealing with different rows) as one write may be complete and the other not while someone else is reading from the cluster. Generally though, you can do read repair when you read it in ;). Ie. See if things are inconsistent

Re: Data Model - Consistency question

2012-09-19 Thread Hiller, Dean
{ ListStudents - These students are saved one per column in the courses row } We sometimes do this with playOrm and don't even bother with the S-SQL it has which also means you don't need to worry about partitioning in that case. Later, Dean On 9/19/12 6:46 AM, Hiller, Dean dean.hil...@nrel.gov wrote

higher layer library makes things faster?

2012-09-19 Thread Hiller, Dean
So there is this interesting case where a higher layer library makes things slower. This is counter-intuitive as every abstraction usually makes things slower with an increase in productivity.It would be cool if more and more libraries supported something to help with this scenario I

Re: higher layer library makes things faster?

2012-09-19 Thread Hiller, Dean
, jef...@gmail.com jef...@gmail.com wrote: Actually its not uncommon at all. Any caching implemented on a higher level will generally improve speed at a cost in memory. Beware common wisdom, its seldom very wise Sent from my Verizon Wireless BlackBerry -Original Message- From: Hiller, Dean

Re: Correct model

2012-09-19 Thread Hiller, Dean
Thinking out loud and I think a bit towards playOrm's model though you don’t' need to use playroom for this. 1. I would probably have a User with the requests either embedded in or the Foreign keys to the requests…either is fine as long as you get the user get ALL FK's and make one request to

Re: Correct model

2012-09-19 Thread Hiller, Dean
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Correct model 2012/9/19 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov Thinking out loud and I think a bit towards playOrm's model though you don’t' need to use playroom for this. 1. I would

Re: Correct model

2012-09-19 Thread Hiller, Dean
ldap and know no one's username is really going to change so username is our primary key. Later, Dean On 9/19/12 2:33 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Uhm, unless I am mistaken, a NEW request implies a new UUID so you can just write it to both the index to the request row

any ways to have compaction use less disk space?

2012-09-20 Thread Hiller, Dean
While diskspace is cheap, nodes are not that cheap, and usually systems have a 1T limit on each node which means we would love to really not add more nodes until we hit 70% disk space instead of the normal 50% that we have read about due to compaction. Is there any way to use less disk space

Re: Correct model

2012-09-23 Thread Hiller, Dean
But the only advantage in this solution is to split data among partitions? You need to split data among partitions or your query won't scale as more and more data is added to table. Having the partition means you are querying a lot less rows. What do you mean here by current partition? He

found major difference in CQL vs Scalable SQL(PlayOrm) and question

2012-09-23 Thread Hiller, Dean
I have been digging more and more into CQL vs. PlayOrm S-SQL and found a major difference that is quite interesting(thought you might be interested plus I have a question). CQL uses a composite row key with the prefix so now any other tables that want to reference that entity have references to

Re: compression

2012-09-23 Thread Hiller, Dean
As well as your unlimited column names may all have the same prefix, right? Like accounts.rowkey56, accounts.rowkey78, etc. etc. so the accounts gets a ton of compression then. Later, Dean From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com Reply-To:

Re: Correct model

2012-09-24 Thread Hiller, Dean
@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Correct model 2012/9/23 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov You need to split data among partitions or your query won't scale as more and more data is added to table. Having the partition means you are querying

Re: Correct model

2012-09-24 Thread Hiller, Dean
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, September 24, 2012 11:07 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Correct model 2012/9/24 Hiller, Dean

Re: Correct model

2012-09-24 Thread Hiller, Dean
secondary indexes I need to update index values manually, right? I got confused when you said PlayOrm indexes the columns you choose. How do I choose and what exactly it means? Best regards, Marcelo Valle. 2012/9/24 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov Oh, ok, you were

Re: Correct model

2012-09-25 Thread Hiller, Dean
need the best performance in the world when reading, but I need to assure scalability and have a simple model to maintain. I liked the playOrm concept regarding this. I have more doubts, but I will ask them at stack over flow from now on. 2012/9/24 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil

Re: Correct model

2012-09-25 Thread Hiller, Dean
on. 2012/9/24 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.gov PlayOrm will automatically create a CF to index my CF? It creates 3 CF's for all indices, IntegerIndice, DecimalIndice, and StringIndice such that the ad-hoc tool

Re: Correct model

2012-09-25 Thread Hiller, Dean
as well for the 1.2.x line. Later, Dean On 9/25/12 6:36 AM, Hiller, Dean dean.hil...@nrel.gov wrote: If you need anything added/fixed, just let PlayOrm know. PlayOrm has been able to quickly add so farŠthat may change as more and more requests come but so far PlayOrm seems to have managed to keep up

is this a cassandra bug?

2012-09-25 Thread Hiller, Dean
This is cassandra 1.1.4 Describe shows DecimalType and I test setting comparator TO the DecimalType and it fails (Realize I have never touched this column family until now except for posting data which succeeded) [default@unknown] use databus; Authenticated to keyspace: databus

Re: is this a cassandra bug?

2012-09-25 Thread Hiller, Dean
never saw anything client sideŠin fact, the client READ back the data fine so I am bit confused hereŠ..1.1.4Š..I tested this on a single node after seeing it in our 6 node cluster with the same results. Thanks, Dean On 9/25/12 2:13 PM, Hiller, Dean dean.hil...@nrel.gov wrote: This is cassandra

any ideas on what these mean

2012-09-26 Thread Hiller, Dean
We were consistently getting this exception over and over as we put data into the system. A reboot caused it to go away but we don't want to be rebooting in the future…. 1. When does this occur? 2. Is it affecting my data put? (I have seen other weird validation exceptions where my data

Re: is this a cassandra bug?

2012-09-26 Thread Hiller, Dean
bump On 9/25/12 2:40 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Hmmm, is rowkey validation asynchronous to the actually sending of the data to cassandra? I seem to be able to put an invalid type and GET that invalid data back just fine even though my key type was an int and the key comparator

1000's of column families

2012-09-26 Thread Hiller, Dean
We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When using the tools they are all geared to analyzing ONE column family at a time :(. If I remember correctly, Cassandra supports as many CF's as you want, correct? Even though I am going to have tons of funs with

is node tool row count always way off?

2012-09-26 Thread Hiller, Dean
The node tool cfstats, what is the row count estimate usually off by(what percentage? Or what absolute number?) We have a CF with 4 rows that prints this out…. Column Family: bacnet11700AnalogInput8 SSTable count: 3 Space used (live): 13526

Re: Once again, super columns or composites?

2012-09-27 Thread Hiller, Dean
Can you describe your use-case in detail as it might be easier to explain a model with composite names. Later, Dean From: Edward Kibardin infa...@gmail.commailto:infa...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
with as many CF's as you want, does anyone know what that limit would be for 16G of RAM or something I could calculate with? Thanks, Dean On 9/27/12 2:37 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Sep 27, 2012 at 12:13 AM, Hiller, Dean dean.hil...@nrel.gov wrote: We are streaming data

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
a single CF with partitions in these case? Wouldn't it be the same thing? I am asking because I might learn a new modeling technique with the answer. []s 2012/9/26 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
it? Of course it is probably much harder than it might problably appear... :D Best regards, Marcelo Valle. 2012/9/27 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov We have 1000's of different building devices and we stream data from these devices. The format and data from each one

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
in one location), though we will see how implementing it goes. How much overhead per column family in RAM? So far we have around 4000 Cfs with no issue that I see yet. Dean On 9/27/12 11:10 AM, Aaron Turner synfina...@gmail.com wrote: On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean dean.hil

Re: 1000's of column families

2012-09-28 Thread Hiller, Dean
the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. 2012/9/27 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil

Re: Help for creating a custom partitioner

2012-10-01 Thread Hiller, Dean
I would be surprised if random partitioner hurt your performance. In general, doing performance tests on a 6 node cluster with PlayOrm Scalable SQL, even joins queries ended up faster as the parallel disks of reading all the rows was way faster than reading from a single machine(remember, one

Re: Rebalancing cluster

2012-10-01 Thread Hiller, Dean
You should check the cassandra.yaml file. There is an initial_token in that file that you should have set. The comment above that property reads # You should always specify InitialToken when setting up a production # cluster for the first time, and often when adding capacity later. # The

Re: Advice on correct storage configuration

2012-10-01 Thread Hiller, Dean
What is really going to matter is what is the applications trying to read? That is really the critical piece of context. Without knowing what the application needs to read, it is very hard to design. One example from a previous post that was a great questions wasŠ 1. I need to get the last 100

Re: 1000's of column families

2012-10-01 Thread Hiller, Dean
these problems. Flavio Il 2012/09/27 16:11 PM, Hiller, Dean ha scritto: We have 1000's of different building devices and we stream data from these devices. The format and data from each one varies so one device has temperature at timeX with some other variables, another device has CO2 percentage

read-repair and deletes / forgotten deletes

2012-10-01 Thread Hiller, Dean
I know there is a 10 day limit if you have a node out of the cluster where you better be running read-repair or you end up with forgotten deletes, but what about on a clean cluster with all nodes always available? Shouldn't the deletes eventually take place or does one have to keep running

Re: read-repair and deletes / forgotten deletes

2012-10-01 Thread Hiller, Dean
to run repair once per/gc_grace period. You won't see empty/deleted rows go away until they're compacted away. On Mon, Oct 1, 2012 at 6:32 PM, Hiller, Dean dean.hil...@nrel.gov wrote: I know there is a 10 day limit if you have a node out of the cluster where you better be running read-repair

Re: read-repair and deletes / forgotten deletes

2012-10-01 Thread Hiller, Dean
Oh, and I have been reading Aaron Mortan's article here http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/ On 10/1/12 12:46 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Thanks, (actually new it was configurable) BUT what I don't get is why I have to run a repair. IF all nodes became

1000's of CF's. virtual CFs do NOT workŠ..map/reduce

2012-10-02 Thread Hiller, Dean
So basically, with moving towards the 1000's of CF all being put in one CF, our performance is going to tank on map/reduce, correct? I mean, from what I remember we could do map/reduce on a single CF, but by stuffing 1000's of virtual Cf's into one CF, our map/reduce will have to read in all 999

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, to address your question, read my last post but to summarize, yes, there is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT when doing map/reduce. Doing map/reduce, you will now have HUGE overhead in reading a whole slew of rows you don't care about as you can't

Re: Read latency issue

2012-10-02 Thread Hiller, Dean
Interesting results. With PlayOrm, we did a 6 node test of reading 100 rows from 1,000,000 using PlayOrm Scalable SQL. It only took 60ms. Maybe we have better hardware though??? We are using 7200 RPM drives so nothing fancy on the disk side of things. More nodes puts at a higher throughput

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 10/2/12 9:00 AM, Ben Hood 0x6e6...@gmail.com wrote: Dean, On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Ben

Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

2012-10-02 Thread Hiller, Dean
be my only missing piece (well, that and the PlayOrm virtual CF feature but I can add that within a week probably though I am on vacation this Thursday to monday). Later, Dean On 10/2/12 6:35 AM, Hiller, Dean dean.hil...@nrel.gov wrote: So basically, with moving towards the 1000's of CF all being

easy repair questions on -pr

2012-10-02 Thread Hiller, Dean
If I understand –pr correctly… 1. -pr forces only the current nodes' stables to be fixed (so I run on each node once) 2. Can I run node tool –pr repair on just 1/RF of my nodes if I do the correct nodes? 3. Without the –pr, it will fix all the stuff on the current node AND the nodes with

Re: easy repair questions on -pr

2012-10-02 Thread Hiller, Dean
GREAT answer, thanks and one last questionŠ So, I suspect I can expect those rows to finally go away when queried from cassandra-cli once GCGraceSeconds has passed then? Or will they always be there forever and ever and ever(this can't be true, right). Thanks, Dean On 10/2/12 9:34 AM, Sylvain

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, Brian, By the way, PlayOrm offers a NoSqlTypedSession that is different than the ORM half of PlayOrm dealing in raw stuff that does indexing(so you can do Scalable SQL on data that has no ORM on top of it). That is what we use for our 1000's of CF's as we don't know the format of any of

Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Hiller, Dean
Can you just use netstat and dig into the process id and do a ps -ef | grep pid to clear up all the confusion. Doing so you can tell which process communicates with which process(I am assuming you are on linuxŠ.on MAC or windows it is different commands). Then, just paste all that in the email

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous. The prefix in a partition they keep the data so all data for a prefix from what I understand is contiguous. QUESTION: What I don't get in the comment is I assume you are referring to

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families Dean, On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous

Re: Simple data model for 1 simple range query?

2012-10-03 Thread Hiller, Dean
Is timeframe/date your composite key? Where timeframe is the first time of a partition of time (ie. If you partition by month, it is the very first time of that month). If so, then, yes, it will be very fast. The smaller your partitions are, the smaller your indexes are as well(ie. B-trees

1000's of CF's. PlayOrm solves the cassandra limit on #ColFamily

2012-10-03 Thread Hiller, Dean
Okay, so it only took me two solid days not a week. PlayOrm in master branch now supports virtual CF's or virtual tables in ONE CF, so you can have 1000's or millions of virtual CF's in one CF now. It works with all the Scalable-SQL, works with the joins, and works with the PlayOrm command

Re: Query over secondary indexes

2012-10-09 Thread Hiller, Dean
approach. -Vivek On Tue, Oct 9, 2012 at 6:20 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Another option may be PlayOrm for you and it's scalable-SQL. We queried one million rows for 100 results in just 60ms. (and it does joins). Query CL =QUORUM. Dean From

Re: Query over secondary indexes

2012-10-09 Thread Hiller, Dean
of having like different dimensions of partitioning PlayOrm does plan on supporting CQL as well but it is not in yet. Later, Dean On 10/9/12 7:51 AM, Hiller, Dean dean.hil...@nrel.gov wrote: If I understand CQL correctly, behind the scenes in wide rows, there is a B-tree. Even when doing the indexing

Re: Upgrading hardware on a node in a cluster

2012-10-10 Thread Hiller, Dean
Well, you could use amazon VPC in which case you DO pick the IP yourself ;)….it makes life a bit easier. Dean From: Martin Koch m...@issuu.commailto:m...@issuu.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org

Re: 1000's of CF's.

2012-10-10 Thread Hiller, Dean
, Sergey. On 04.10.2012 0:10, Hiller, Dean wrote: Okay, so it only took me two solid days not a week. PlayOrm in master branch now supports virtual CF's or virtual tables in ONE CF, so you can have 1000's or millions of virtual CF's in one CF now. It works with all the Scalable-SQL, works

CQL Sets and Maps

2012-10-11 Thread Hiller, Dean
I was reading Brian's post http://mail-archives.apache.org/mod_mbox/cassandra-dev/201210.mbox/%3ccajhhpg20rrcajqjdnf8sf7wnhblo6j+aofksgbxyxwcoocg...@mail.gmail.com%3E In which he asks Any insight into why CQL puts that in column name? Where does it store the metadata related to compound key

Re: [problem with OOM in nodes]

2012-10-11 Thread Hiller, Dean
Splitting one report to multiple rows is uncomfortably WHY? Reading from N disks is way faster than reading from 1 disk. I think in terms of PlayOrm and then explain the model you can use so I think in objects first Report { String uniqueId String reportName; //may be indexable and query

Re: unnecessary tombstone's transmission during repair process

2012-10-12 Thread Hiller, Dean
+1 I want to see how this plays out as well. Anyone know the answer? Dean From: Alexey Zotov azo...@griddynamics.commailto:azo...@griddynamics.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday,

Re: Using Cassandra to store binary files?

2012-10-16 Thread Hiller, Dean
Astyanax provides a streaming file feature and was written by netflix who is storing probably a huge amount of files with that feature. I was going to use that feature for one product but I never got around to creating the product…..but I still use astyanax under the hood of PlayOrm (we kind

Re: Using Cassandra to store binary files?

2012-10-16 Thread Hiller, Dean
Yes, astyanax stores the file in many rows so it reads from many disks giving you a performance advantage vs. storing each file in one row….well at least from my understanding so read performance should be really really good in that case. Dean From: Michael Kjellman

Re: Using Cassandra to store binary files?

2012-10-16 Thread Hiller, Dean
, Michael Kjellman mkjell...@barracuda.com wrote: Ah, so they just wrote chunking into Astyanax? Do they create an index somewhere so they know how to reassemble the file on the way out? On 10/16/12 10:36 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Yes, astyanax stores the file in many rows so it reads

Re: Astyanax empty column check

2012-10-17 Thread Hiller, Dean
What specifically are you trying to achieve? The business requirement might help as there are other ways of solving it such that you do not need to know the difference. Dean From: Xu Renjie xrjxrjxrj...@gmail.commailto:xrjxrjxrj...@gmail.com Reply-To:

Re: how to get column type?

2012-10-18 Thread Hiller, Dean
This is specifically why Cassandra and even PlayOrm are going the direction of partial schemas. Everything in cassandra in raw form is just bytes. If you don't tell it the types, it doesn't know how to translate it. PlayOrm and other ORM layers are the same way though in these noSQL ORMs you

Re: tombstones and their data

2012-10-22 Thread Hiller, Dean
My understanding is any time from that node. Another node may have a different existing value and tombstone vs. that existing data(most recent timestamp wins). Ie. The data is not needed on that node so compaction should be getting rid of it, but I never confirmed thisŠ.I hope you get

Re: What does ReadRepair exactly do?

2012-10-24 Thread Hiller, Dean
Keep in mind, returning the older version is usually fine. Just imagine if your user clicked write 1 ms before, then the new version might be returned. If he gets the older version and refreshes the page, he gets the newer version. Same with an automated program as wellŠ.in general it is okay

Re: What does ReadRepair exactly do?

2012-10-24 Thread Hiller, Dean
if you write the code in a way that works, this concept works out great in most cases(in some cases, you need to think a bit differently and solve it other ways). I hope that clears it up Later, Dean On 10/24/12 8:02 AM, shankarpnsn shankarp...@gmail.com wrote: Hiller, Dean wrote in general

Re: What does ReadRepair exactly do?

2012-10-24 Thread Hiller, Dean
@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: What does ReadRepair exactly do? And we don't send read request to all of the three replicas (R1, R2, R3) if CL=QUOROM; just 2 of them depending on proximity On Wed, Oct 24, 2012 at 10:20 PM, Hiller, Dean dean.hil

Re: What does ReadRepair exactly do?

2012-10-24 Thread Hiller, Dean
Thanks Zhang. But, this again seems a little strange thing to do, since one (say R2) of the 2 close replicas (say R1,R2) might be down, resulting in a read failure while there are still enough number of replicas (R1 and R3) live to satisfy a read. He means in the case where all 3 nodes are

Re: What does ReadRepair exactly do?

2012-10-25 Thread Hiller, Dean
Kind of an interesting question I think you are saying if a client read resolved only the two nodes as said in Aaron's email back to the client and read -repair was kicked off because of the inconsistent values and the write did not complete yet and I guess you would have two nodes go down to

Re: High bandwidth usage between datacenters for cluster

2012-10-25 Thread Hiller, Dean
Use the datacenter replication strategy and try it with that so you tell cassandra all your data centers, racks, etc. Dean From: Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

Re: ColumnFamilyInputFormat - error when column name is UUID

2012-10-29 Thread Hiller, Dean
Hmm, this brings the question of what uuid libraries are others using? I know this one generates type 1 UUIDs with two longs so it is 16 bytes. http://johannburkard.de/software/uuid/ Thanks, Dean From: Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com Reply-To:

Re: ColumnFamilyInputFormat - error when column name is UUID

2012-10-30 Thread Hiller, Dean
/Universally_unique_identifier The only problem with type 1 UUIDs is they are not opaque? I know there is one kind of UUID that can generate two equal values if you generate them at the same milisecond, but I guess I was confusing them... Best regards, Marcelo Valle. 2012/10/29 Hiller, Dean dean.hil

Re: Benifits by adding nodes to the cluster

2012-10-30 Thread Hiller, Dean
1. High availability 2. You can hold much much more data 3. Better performance 4. You can do disaster recovery live-live datacenters (if you configure cassandra) On 10/29/12 4:02 PM, Andrey Ilinykh ailin...@gmail.com wrote: This is how cassandra scales. More nodes means better performance.

logging servers? any interesting in one for cassandra?

2012-11-01 Thread Hiller, Dean
2 questions 1. What are people using for logging servers for their web tier logging? 2. Would anyone be interested in a new logging server(any programming language) for web tier to log to your existing cassandra(it uses up disk space in proportion to number of web servers and just has a

documentation on PlayOrm released

2012-11-07 Thread Hiller, Dean
The first set of documentation on PlayOrm is now released. It is also still growing as we have a dedicated person working on more documentation. Check it out when you have a chance. Later, Dean

Re: documentation on PlayOrm released

2012-11-07 Thread Hiller, Dean
My bad. It is on the github PlayOrm wiki. The specific link is https://github.com/deanhiller/playorm/wiki Later, Dean

  1   2   3   4   >