So many questions, why stop there?

First question... What would cause the name node to have a GC issue?
Second question... You're streaming 1PB a day. Is this a single stream of data?
Are you writing this to one file before processing, or are you processing the 
data directly on the ingestion stream?

Are you also filtering the data so that you are not saving all of the data?

This sounds like a homework assignment than a real world problem.

I guess people don't race cars against trains or have two trains traveling in 
different directions anymore... :-)


Sent from a remote device. Please excuse any typos...

Mike Segel

On Aug 10, 2011, at 12:07 PM, jagaran das <[email protected]> wrote:

> To be precise, the projected data is around 1 PB.
> But the publishing rate is also around 1GBPS.
> 
> Please suggest.
> 
> 
> ________________________________
> From: jagaran das <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Wednesday, 10 August 2011 12:58 AM
> Subject: Namenode Scalability
> 
> In my current project we  are planning to streams of data to Namenode (20 
> Node Cluster).
> Data Volume would be around 1 PB per day.
> But there are application which can publish data at 1GBPS.
> 
> Few queries:
> 
> 1. Can a single Namenode handle such high speed writes? Or it becomes 
> unresponsive when GC cycle kicks in.
> 2. Can we have multiple federated Name nodes  sharing the same slaves and 
> then we can distribute the writes accordingly.
> 3. Can multiple region servers of HBase help us ??
> 
> Please suggest how we can design the streaming part to handle such scale of 
> data. 
> 
> Regards,
> Jagaran Das 

Reply via email to