Re: data agility

Jahangir Mohammed Sun, 20 Nov 2011 13:15:12 -0800

IMHO, you should start with something very simple RDBMS and meanwhile
getting handle over Cassandra or other noSql technology. Start out with
simple, but always be aware and conscious of the next thing you will have
in stack. It's timetaking to work with new technology if you are in the
phase of prototyping something fast and geared towards a Vc demo. In most
of the cases, you won't need noSql for a while unless there is a very
strong case.


Thanks,
Jahangir
On Nov 20, 2011 4:04 PM, "Dotan N." <dip...@gmail.com> wrote:

> Thanks David.
> Stephen: thanks for the tip, we can run a recommended configuration, so
> that wouldn't be an issue. I guess I can focus that my questions are on
> complexity of development.
>
> After digesting David's answer, I guess my follow up questions would be
> - how would you process data in a cassandra cluster, typically? via
> one-off coded offline jobs?
> - how easy is map/reduce on existing data (just looked at brisk but it may
> be unrelated, any case, not too much written about it)
> - how would you do analytics over a cassandra cluster
> - given the common examples of time-series, how would you recommend to
> aggregate (sum, avg, facet) and provide statistics over the collected data?
> for example if it were kinds of logs and you'd like to group all of certain
> fields in it, or provide a histogram over it.
>
> Thanks!
>
>
> --
> Dotan, @jondot <http://twitter.com/jondot>
>
>
>
> On Sun, Nov 20, 2011 at 10:32 PM, Stephen Connolly <
> stephen.alan.conno...@gmail.com> wrote:
>
>> if your startup is bootstrapping then cassandra is sometimes to heavy to
>> start with.
>>
>> i.e. it needs to be fed ram... you're not going to seriously run it in
>> less than 1gb per node... that level of ram commitment can be too much
>> while bootstrapping.
>>
>> if your startup has enough cash to pay for 3-5 recommended spec (see
>> wiki) nodes to be up 24/7 then cassandra is a good fit...
>>
>> a friend of mine is bootstrapping a startup and had to drop back to mysql
>> while he finds his pain points and customers... he knows he will end up
>> jumping back to cassandra when he gets enough customers (or a VC) but for
>> now the running costs are too much to pay from his own pocket... note that
>> the jdbc driver and cql will make jumping back easy for him (as he still
>> tests with c*... just runs at present against mysql.... nuts eh!)
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on the
>> screen
>> On 20 Nov 2011 19:07, "Dotan N." <dip...@gmail.com> wrote:
>>
>>> Hi all,
>>> my question may be more philosophical than related technically
>>> to Cassandra, but please bear with me.
>>>
>>> Given that a young startup may not know its product full at the early
>>> stages, but that it definitely points to ~200M users,
>>> would Cassandra will be the right way to go?
>>>
>>> That is, the requirement is for a large data store, that can move with
>>> product changes and requirements swiftly.
>>>
>>> Given that in Cassandra one thinks hard about the queries, and then
>>> builds a model to suit it best, I was thinking of
>>> this situation as problematic.
>>>
>>> So here are some questions:
>>>
>>> - would it be wiser to start with a more agile data store (such as
>>> mongodb) and then progress onto Cassandra, when the product itself
>>> solidifies?
>>> - given that we start with Cassandra from the get go, what is a common
>>> (and quick in terms of development) way or practice to change data, change
>>> schemas, as the product evolves?
>>> - is it even smart to start with Cassandra? would only startups whose
>>> core business is big data start with it from the get go?
>>> - how would you do map/reduce with Cassandra? how agile is that? (for
>>> example, can you run map/reduce _very_ frequently?)
>>>
>>> Thanks!
>>>
>>> --
>>> Dotan, @jondot <http://twitter.com/jondot>
>>>
>>>
>

Re: data agility

Reply via email to