On Wednesday, September 17, 2014 03:55:56 PM James wrote:
> J. Roeleveld <joost <at> antarean.org> writes:
> > > Distributed File Systems (DFS):
> > 
> > > Local (Device) File Systems LFS:
> > Is my understanding correct that the top list all require one of
> > the bottom  list?
> > Eg. the "clustering" FSs only ensure the files on the LFSs are
> > duplicated/spread over the various nodes?
> > 
> > I would normally expect the clustering FS to be either the full layer
> > or a  clustered block-device where an FS can be placed on top.
> 
> I have not performed these installation yet. My research indicates
> that first you put the Local FS on the drive, just like any installation
> of Linux. Then you put the distributed FS on top of this. Some DFS might
> not require a LFS, but FhGFS does and does HDFS. I will not acutally
> be able to accurately answer your questions, until I start to build
> up the 3 system cluster. (a week or 2 away) is my best guess.

Playing around with clusters is on my list, but due to other activities having 
a higher priority, I haven't had much time yet.

> > Otherwise it seems more like a network filesystem with caching
> > options (See  AFS).
> 
> OK, I'll add AFS. You may be correct on this one  or AFS might be both.

Personally, I would read up on these and see how they work. Then, based 
on that, decide if they are likely to assist in the specific situation you are 
interested in.
AFS, NFS, CIFS,... can be used for clusters, but, apart from NFS, I wouldn't 
expect much performance out of them.
If you need it to be fault-tolerant and not overly rely on a single point of 
failure, I wouldn't be using any of these. Only AFS, from my original 
investigation, showed some fault-tolerence, but needed too many 
resources (disk-space) on the clients.

> > I am also interested in these filesystems, but for a slightly different
> 
> > scenario:
> Ok, so I the "test-dummy-crash-victim" I'd be honored to have, you,
> Alan, Neil, Mic  etc etc back-seat-0drive on this adventure! (The more
> I read the more it's time for burbon, bash, and a  bit of cursing
> to get started...)

Good luck and even though I'd love to join in with the testing, I simply do 
not have the time to keep up. I would probably just slow you down.

> > - 2 servers in remote locations (different offices)
> > - 1 of these has all the files stored (server A) at the main office
> > - The other (server B - remote office) needs to "offer" all files
> > from serverA  When server B needs to supply a file, it needs to
> > check if the local copy is still the "valid" version.
> > If yes, supply the local copy, otherwise download
> > from server A. When a file is changed, server A needs to be updated.
> > While server B is sharing a file, the file needs to be locked on server A
> > preventing simultaneous updates.
> 
> OOch, file locking (precious tells me that is alway tricky).

I need it to be locked on server A while server B has a proper write-lock to 
avoid 2 modifications to compete with each other.

> (pist, systemd is causing fits for the clustering geniuses;
> some are espousing a variety of cgroup gymnastics for phantom kills)

phantom kills?

> Spark is fault tolerant, regardless of node/memory/drive failures
> above the fault tolerance that a file system configuration many support.
> If fact, files lost can be 'regenerated' but it is computationally
> expensive.

Too much for me.

> You have to get your file system(s) set up. Then install
> mesos-0.20.0 and then spark. I have mesos mostly ready. I should
> have spark in alpha-beta this weekend. I'm fairly clueless on the
> DFS/LFS issue, so a DFS that needs no LFS might be a good first choice
> for testing the (3) system cluster.

That, or a 4th node acting like a NAS sharing the filesystem over NFS.

> > I prefer not to supply the same amount of storage at server B as
> > server A has. The remote location generally only needs access to 5% 
of
> > the total amount of files stored on server A. But not always the same 
5%.
> > Does anyone know of a filesystem that can handle this?
> 
> So in clustering, from what I have read, there are all kinds of files
> passed around between the nodes and the master(s). Many are critical
> files not part of the application or scientific calculations.
> So in time, I think in a clustering evironment, all you seek is
> very possible, but it's a hunch, gut feeling, not fact. I'd put
> raid mirros underdneath that system, if it makes sense, for now,
> or just dd the stuff with a script of something kludgy (Alan is the
> king of kludge....)

Hmm... mirroring between servers. Always an option, except it will not work 
for me in this case:
1) Remote location will have a domestic ADSL line. I'll be lucky if it has a 
500kbps uplink
2) Server A, currently, has around 7TB of current data that also needs to 
be available on the remote site.

With a 8mbps downlink, waiting for a file to be copied to the remote site is 
acceptable. After modifications, the new version can be copied back to 
serverA slowly during network-idle-time or when server A actually needs it.
If there is a constant mirroring between A and B, the 500kbps (if I am 
lucky) will be insufficient.

> On gentoo planet one of the devs has "Consul" in his overlays. Read
> up on that for ideas that may be relevant to what you need.

Assuming the following is the website:
http://www.consul.io/intro/vs/

Then this seems more a tool to replace Nagios, Puppet and similar. It 
doesn't have any magic inside to actually distribute a filesystem in a way 
that when a file is "cached" at the local site, you don't have to wait for it 
to 
download from the remote site. And any changes to the file will be copied 
to the master store automagically.
It is intelligent enough to invalidate local copies only when the master 
copy got changed.
And it distributes write-locks to ensure edits can occur only via 1 server at 
a time. And every user will always get the latest version, regardless of 
where/when it was last edited.

--
Joost 

> 
> > Joost
> 
> James

Reply via email to