Dear Konstantin Shvachko,

Thanks for your reply.

I have anyway decided to try Hadoop for our application and I successfully 
connected 22 nodes and it's working fine so far. I had one issue though where 
the master node generated 5 log files of size 125GB each in 5 days and resulted 
in crashing the server. Anyway I have changed the properties in log4j and fixed 
it.

I have a few more queries and really appreciate if you can give me an answer to 
those.

1. If I set the replication level to 3 or 4, will I be able to access the files 
even if one or two slave nodes go down.

2. If one of the slave nodes run out of disk space, will hadoop perform any 
defragmentation process by moving some data blocks onto other nodes?

3. Can I run multiple master nodes with the same set of slaves and then by 
modifying the code have the master nodes communicating to each other informing 
the data chunks stored in slave nodes? By this way we can run multiple master 
nodes and provide options for load balancing and clustering. You would be the 
right person to suggest an approach and I can work on it.
 
Please let me know your thougts...

Thanks and Best
Jugs

-----Original Message-----
    From: "Konstantin Shvachko"<[EMAIL PROTECTED]>
    Sent: 8/10/06 11:27:13 PM
    To: "[email protected]"<[email protected]>
    Subject: Re: Some queries about stability and reliability
    
    Hi Jagadeesh,
    
    >I am very much new to Hadoop and would like to know some details about the
    >reliability and stability. I am developing flickr kind of an application 
for
    >storing and sharing movies and would like to use Hadoop as my storage
    >backend. I am planning to put in atleast 100 nodes and would like to know
    >more about the product. I will appreciate if you could answer some of my
    >queries.
    >  
    >
    This is a very interesting application for Hadoop.
    Did you have any progress with the system?
    
    >1. Is the product matured enough for using in an application like this?
    >  
    >
    Yes.
    
    >2. Has somebody tested it using atleast 100 nodes?
    >  
    >
    Yes, there are even larger installations.
    
    >3. Can I have multiple master nodes in Hadoop to do load balancing and
    >fail-overs?
    >  
    >
    Not yet.
    
    >4. What is the maximum number of simultaneous connections possible in
    >Hadoop?
    >  
    >
    Hadoop is designed to support and actually supports high volume of 
    simultaneous connections.
    E.g., on a 100 node cluster an extensive map-reduce job can generate 400 
    concurrent connections.
    
    Creation time and date is not implemented for DFS files.
    Do you have a good application for ctime?
    
    Thank you,
    
    --Konstantin

Reply via email to