Thanks for this Derek - looking into it now.

"Why spend $600 each for Mac Minis which you can't add SATA drives
to?" - you got me... I am not a hardware guy ;o)

Please allow me one more stupid newbie question...

Suppose 500G tab file with 250M rows, and looking to put it in HBase
table, do scanner based parsing of fields into new fields within the
row, do several Lucene index generation (only generation), a few
custom MR jobs etc.  Is a cluster of 6 servers (Xeon 850, 4G ram, 500G
SATA) on 1G switch too few for a "proof of solution" / development
environment and just a waste of time and money setting up?  (Need to
produce a working proof of solution, not high performance, but ability
to predict performance based on growing cluster, in order to move
development away from current mysql based solution)

Thanks,

Tim



On Tue, Jan 6, 2009 at 9:56 AM, Derek Pappas <[email protected]> wrote:
> Our company looked at the cost of moving data in and out of the cloud
> for our application and concluded that it was prohibitive. Instead we
> use a co-lo which costs $850/month (15AMPS (assume 1.5 amps/server,
> 20mbs). We hammer on the internet connect keeping it pegged at 20mbs.
> We also store a lot of data. So we ruled out EC2 and built a small cluster
> with a lot of 1TB drives.
>
> One option is to buy 1U servers that are a couple of years old.
> You can pick up dual Xeon Dell 850's for ~$250 if you look around.
> These have SATA controllers and you can buy Seagate 1TB drives
> from Frys or Newegg for ~$100.
>
> We converted Dell 850's with SCSI controllers (pulled 'em) and switched the
> BIOS HDD controller setting from SCSI to SATA. Then we installed 2TB Seagate
> SATA drives. Cost $200 (we got the 4 year old Dell 850s for free).
>
> Then buy a 1GB switch for $140.
>
> Add faster machines to the mix if you need them.
>
> Then install CENTOS 5.2 using the DVD in no time flat and you can then get
> Hadoop installed.
> Hint: make sure the older 1U servers have DVDROMs and not CDROMs to reduce
> the install time.
>
> Better yet create a local mirror and use a boot disk to configure each
> machine via
> the local mirror.
>
> Why spend $600 each for Mac Minis which you can't add SATA drives to?
>
> -Derek
>
>
> On Jan 6, 2009, at 12:16 AM, tim robertson wrote:
>
>> Andy is quite right, and a 10 node (large instance) cluster costs $100
>> a day + traffic.  Fine for a couple days testing but for a month, I
>> can't pay this as it is currently personal research.
>>
>> For those of us testing and researching HBase - can anyone suggest a
>> better hosted alternative for running small clusters (say 200-400G
>> when in tab file format?) than EC2?
>>
>> What do people use for their personal dev environments? - please
>> forgive this stupid question but should I invest in a 5 mini macs (3G
>> memory  and 250G HD) for example?  If research proves successful, then
>> would look at moving into production of course on proper hardware.
>>
>> Cheers,
>>
>> Tim
>>
>>
>>
>> On Tue, Jan 6, 2009 at 1:25 AM, Andrew Purtell <[email protected]> wrote:
>>>>
>>>> From: stack
>>>> If someone can confirm that Michael Gottesmans's AMI
>>>> works, lets post its location prominently on the
>>>> hbase wiki someplace.
>>>
>>> It's 0.2. Wouldn't recommend it.
>>>
>>> AMIs need updating, unless they're a base system that contain
>>> scripts that grab and install the latest and greatest from a
>>> stable URL. Maybe Maven or Ivy + Ant could help with that.
>>> Even still, OS vendor package updates, etc...
>>>
>>> Incidentally I did an estimate once of what it would cost me
>>> to tinker with HBase on EC2. Came to ~$30K USD/month. That's
>>> not spare change.
>>>
>>>  - Andy
>>>
>>>
>>>
>>>
>>>
>
> Best Regards,
>
> Derek Pappas
> depappas at yahoo d0t com
>
>
>
>
>

Reply via email to