Dear Konstantin Shvachko,
Thanks for your reply.
I have anyway decided to try Hadoop for our application and I successfully
connected 22 nodes and it's working fine so far. I had one issue though where
the master node generated 5 log files of size 125GB each in 5 days and resulted
in crashing the server. Anyway I have changed the properties in log4j and fixed
it.
I have a few more queries and really appreciate if you can give me an answer to
those.
1. If I set the replication level to 3 or 4, will I be able to access the files
even if one or two slave nodes go down.
2. If one of the slave nodes run out of disk space, will hadoop perform any
defragmentation process by moving some data blocks onto other nodes?
3. Can I run multiple master nodes with the same set of slaves and then by
modifying the code have the master nodes communicating to each other informing
the data chunks stored in slave nodes? By this way we can run multiple master
nodes and provide options for load balancing and clustering. You would be the
right person to suggest an approach and I can work on it.
Please let me know your thougts...
Thanks and Best
Jugs
-----Original Message-----
From: "Konstantin Shvachko"<[EMAIL PROTECTED]>
Sent: 8/10/06 11:27:13 PM
To: "[email protected]"<[email protected]>
Subject: Re: Some queries about stability and reliability
Hi Jagadeesh,
>I am very much new to Hadoop and would like to know some details about the
>reliability and stability. I am developing flickr kind of an application
for
>storing and sharing movies and would like to use Hadoop as my storage
>backend. I am planning to put in atleast 100 nodes and would like to know
>more about the product. I will appreciate if you could answer some of my
>queries.
>
>
This is a very interesting application for Hadoop.
Did you have any progress with the system?
>1. Is the product matured enough for using in an application like this?
>
>
Yes.
>2. Has somebody tested it using atleast 100 nodes?
>
>
Yes, there are even larger installations.
>3. Can I have multiple master nodes in Hadoop to do load balancing and
>fail-overs?
>
>
Not yet.
>4. What is the maximum number of simultaneous connections possible in
>Hadoop?
>
>
Hadoop is designed to support and actually supports high volume of
simultaneous connections.
E.g., on a 100 node cluster an extensive map-reduce job can generate 400
concurrent connections.
Creation time and date is not implemented for DFS files.
Do you have a good application for ctime?
Thank you,
--Konstantin