Thanks Aaron, I kept this use-case free as to focus on the higher level description, it might have been a not a good idea. But generally I think I got a better intuition from the various answers, thanks!
-- Dotan, @jondot <http://twitter.com/jondot> On Sun, Nov 20, 2011 at 11:52 PM, Aaron Turner <synfina...@gmail.com> wrote: > Sounds like you need to figure out what your product is going to do > and what technology will best fit those requirements. I know you're > worried about being agile and all that, but scaling requires you to > use the right tool for the job. Worry about new requirements when they > rear their ugly head rather then a dozen of "what if" scenarios. > > You can scale MySQL/etc and Cassandra, MongoDB, etc to 10-200M "users" > depending on what you're asking your datastore to do. You haven't > defined that really at all other then some comments about wanting to > do some map/reduce jobs. > > Really what you should be doing is figuring out what kind of data you > need to store and your needs like access patterns, availability, ACID > compliance, etc and then figure out what technology is the best fit. > There are tons of "Cassandra vs X" comparisons for every NoSQL DB in > existence. > > Other then that, the map/reduce on Cassandra is more job based rather > then useful for interactive queries so if that is important then > Cassandra prolly isn't a good fit. You did mention time series data > too, and that's a sweet spot for Cassandra and not something I > personally would put in a document based datastore like MonogoDB. > > Good luck. > -Aaron > > On Sun, Nov 20, 2011 at 1:24 PM, Dotan N. <dip...@gmail.com> wrote: > > Jahangir, thanks! however I've noted that we may very well need > to scale to > > 200M users or "entities" within a short amount of time - say a year or > two, > > 10M within few months. > > > > -- > > Dotan, @jondot > > > > > > On Sun, Nov 20, 2011 at 11:14 PM, Jahangir Mohammed > > <md.jahangi...@gmail.com> wrote: > >> > >> IMHO, you should start with something very simple RDBMS and meanwhile > >> getting handle over Cassandra or other noSql technology. Start out with > >> simple, but always be aware and conscious of the next thing you will > have in > >> stack. It's timetaking to work with new technology if you are in the > phase > >> of prototyping something fast and geared towards a Vc demo. In most of > the > >> cases, you won't need noSql for a while unless there is a very strong > case. > >> > >> Thanks, > >> Jahangir > >> > >> On Nov 20, 2011 4:04 PM, "Dotan N." <dip...@gmail.com> wrote: > >>> > >>> Thanks David. > >>> Stephen: thanks for the tip, we can run a recommended configuration, so > >>> that wouldn't be an issue. I guess I can focus that my questions are on > >>> complexity of development. > >>> After digesting David's answer, I guess my follow up questions would be > >>> - how would you process data in a cassandra cluster, typically? via > >>> one-off coded offline jobs? > >>> - how easy is map/reduce on existing data (just looked at brisk but it > >>> may be unrelated, any case, not too much written about it) > >>> - how would you do analytics over a cassandra cluster > >>> - given the common examples of time-series, how would you recommend to > >>> aggregate (sum, avg, facet) and provide statistics over the collected > data? > >>> for example if it were kinds of logs and you'd like to group all of > certain > >>> fields in it, or provide a histogram over it. > >>> Thanks! > >>> > >>> -- > >>> Dotan, @jondot > >>> > >>> > >>> On Sun, Nov 20, 2011 at 10:32 PM, Stephen Connolly > >>> <stephen.alan.conno...@gmail.com> wrote: > >>>> > >>>> if your startup is bootstrapping then cassandra is sometimes to heavy > to > >>>> start with. > >>>> > >>>> i.e. it needs to be fed ram... you're not going to seriously run it in > >>>> less than 1gb per node... that level of ram commitment can be too > much while > >>>> bootstrapping. > >>>> > >>>> if your startup has enough cash to pay for 3-5 recommended spec (see > >>>> wiki) nodes to be up 24/7 then cassandra is a good fit... > >>>> > >>>> a friend of mine is bootstrapping a startup and had to drop back to > >>>> mysql while he finds his pain points and customers... he knows he > will end > >>>> up jumping back to cassandra when he gets enough customers (or a VC) > but for > >>>> now the running costs are too much to pay from his own pocket... note > that > >>>> the jdbc driver and cql will make jumping back easy for him (as he > still > >>>> tests with c*... just runs at present against mysql.... nuts eh!) > >>>> > >>>> - Stephen > >>>> > >>>> --- > >>>> Sent from my Android phone, so random spelling mistakes, random > nonsense > >>>> words and other nonsense are a direct result of using swype to type > on the > >>>> screen > >>>> > >>>> On 20 Nov 2011 19:07, "Dotan N." <dip...@gmail.com> wrote: > >>>>> > >>>>> Hi all, > >>>>> my question may be more philosophical than related technically > >>>>> to Cassandra, but please bear with me. > >>>>> Given that a young startup may not know its product full at the early > >>>>> stages, but that it definitely points to ~200M users, > >>>>> would Cassandra will be the right way to go? > >>>>> That is, the requirement is for a large data store, that can move > with > >>>>> product changes and requirements swiftly. > >>>>> Given that in Cassandra one thinks hard about the queries, and then > >>>>> builds a model to suit it best, I was thinking of > >>>>> this situation as problematic. > >>>>> So here are some questions: > >>>>> - would it be wiser to start with a more agile data store (such as > >>>>> mongodb) and then progress onto Cassandra, when the product itself > >>>>> solidifies? > >>>>> - given that we start with Cassandra from the get go, what is a > common > >>>>> (and quick in terms of development) way or practice to change data, > change > >>>>> schemas, as the product evolves? > >>>>> - is it even smart to start with Cassandra? would only startups whose > >>>>> core business is big data start with it from the get go? > >>>>> - how would you do map/reduce with Cassandra? how agile is that? (for > >>>>> example, can you run map/reduce _very_ frequently?) > >>>>> Thanks! > >>>>> -- > >>>>> Dotan, @jondot > >>> > > > > > > > > -- > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & > Windows > Those who would give up essential Liberty, to purchase a little temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero" >