Re: [Gluster-devel] Infiniband help

2007-10-19 Thread Nathan Allen Stratton
On Fri, 19 Oct 2007, Anand Avati wrote:

 Nathan,
   if you have IPoIB working, using ib-verbs should be straight forward. Just
 use the IPoIB's IP addresses and transport-type as ib-verbs/{client,server}
 and things should just work.

 ib-sdp was a 'stopgap' solution when the ib-verbs driver was not yet
 implemented. You need ib_sdp kernel module for it to work. Also make sure
 you have 'ib_uverbs' kernel module for ib-verbs to work. If still things
 dont work, run both the server and client with -LDEBUG and attach the logs.

Server:

2007-10-19 20:36:42 D [client-protocol.c:4294:client_protocol_reconnect]
brick-a-ns: attempting reconnect
2007-10-19 20:36:42 D [ib-verbs-client.c:188:ib_verbs_client_connect]
brick-a-ns: connection on 5 success, attempting to handshake
2007-10-19 20:36:42 E [ib-verbs.c:762:ib_verbs_handshake]
transport/ib-verbs: brick-a-ns: could not send IB handshake data
2007-10-19 20:36:42 E [ib-verbs-client.c:197:ib_verbs_client_connect]
brick-a-ns: ib_verbs_handshake failed
2007-10-19 20:36:42 D [client-protocol.c:4294:client_protocol_reconnect]
brick-b-ns: attempting reconnect
2007-10-19 20:36:42 D [ib-verbs-client.c:188:ib_verbs_client_connect]
brick-b-ns: connection on 5 success, attempting to handshake
2007-10-19 20:36:42 E [ib-verbs.c:762:ib_verbs_handshake]
transport/ib-verbs: brick-b-ns: could not send IB handshake data
2007-10-19 20:36:42 E [ib-verbs-client.c:197:ib_verbs_client_connect]
brick-b-ns: ib_verbs_handshake failed
2007-10-19 20:36:42 D [client-protocol.c:4294:client_protocol_reconnect]
brick-a: attempting reconnect
2007-10-19 20:36:42 D [ib-verbs-client.c:188:ib_verbs_client_connect]
brick-a: connection on 5 success, attempting to handshake
2007-10-19 20:36:42 E [ib-verbs.c:762:ib_verbs_handshake]
transport/ib-verbs: brick-a: could not send IB handshake data
2007-10-19 20:36:42 E [ib-verbs-client.c:197:ib_verbs_client_connect]
brick-a: ib_verbs_handshake failed
2007-10-19 20:36:42 D [client-protocol.c:4294:client_protocol_reconnect]
mirror-a: attempting reconnect
2007-10-19 20:36:42 D [ib-verbs-client.c:188:ib_verbs_client_connect]
mirror-a: connection on 5 success, attempting to handshake
2007-10-19 20:36:42 E [ib-verbs.c:762:ib_verbs_handshake]
transport/ib-verbs: mirror-a: could not send IB handshake data
2007-10-19 20:36:42 E [ib-verbs-client.c:197:ib_verbs_client_connect]
mirror-a: ib_verbs_handshake failed
2007-10-19 20:36:42 D [client-protocol.c:4294:client_protocol_reconnect]
brick-b: attempting reconnect
2007-10-19 20:36:42 D [ib-verbs-client.c:188:ib_verbs_client_connect]
brick-b: connection on 5 success, attempting to handshake
2007-10-19 20:36:42 E [ib-verbs.c:762:ib_verbs_handshake]
transport/ib-verbs: brick-b: could not send IB handshake data
2007-10-19 20:36:42 E [ib-verbs-client.c:197:ib_verbs_client_connect]
brick-b: ib_verbs_handshake failed
2007-10-19 20:36:42 D [client-protocol.c:4294:client_protocol_reconnect]
mirror-c: attempting reconnect
2007-10-19 20:36:42 D [ib-verbs-client.c:188:ib_verbs_client_connect]
mirror-c: connection on 5 success, attempting to handshake
2007-10-19 20:36:42 E [ib-verbs.c:762:ib_verbs_handshake]
transport/ib-verbs: mirror-c: could not send IB handshake data
2007-10-19 20:36:42 E [ib-verbs-client.c:197:ib_verbs_client_connect]
mirror-c: ib_verbs_handshake failed

Client:

2007-10-19 20:42:56 D [glusterfs.c:138:get_spec_fp] glusterfs: loading
spec from /usr/local/etc/glusterfs/client.vol
2007-10-19 20:42:56 W [fuse-bridge.c:2100:fuse_transport_notify]
glusterfs-fuse: Ignoring notify event 4
2007-10-19 20:42:56 D [spec.y:116:new_section] libglusterfs/parser: New
node for 'share'
2007-10-19 20:42:56 D [spec.y:132:section_type] libglusterfs/parser:
Type:share:protocol/client
2007-10-19 20:42:56 D [xlator.c:102:xlator_set_type] libglusterfs/xlator:
attempt to load type protocol/client
2007-10-19 20:42:56 D [xlator.c:109:xlator_set_type] libglusterfs/xlator:
attempt to load file
/usr/local/lib/glusterfs/1.3.6/xlator/protocol/client.so
2007-10-19 20:42:56 D [spec.y:152:section_option] libglusterfs/parser:
Option:share:transport-type:ib-verbs/client
2007-10-19 20:42:56 D [spec.y:152:section_option] libglusterfs/parser:
Option:share:remote-host:192.168.0.12
2007-10-19 20:42:56 D [spec.y:152:section_option] libglusterfs/parser:
Option:share:remote-subvolume:share
2007-10-19 20:42:56 D [spec.y:216:section_end] libglusterfs/parser:
end:share
2007-10-19 20:42:56 D [spec.y:116:new_section] libglusterfs/parser: New
node for 'writeback'
2007-10-19 20:42:56 D [spec.y:132:section_type] libglusterfs/parser:
Type:writeback:performance/write-behind
2007-10-19 20:42:56 D [xlator.c:102:xlator_set_type] libglusterfs/xlator:
attempt to load type performance/write-behind
2007-10-19 20:42:56 D [xlator.c:109:xlator_set_type] libglusterfs/xlator:
attempt to load file
/usr/local/lib/glusterfs/1.3.6/xlator/performance/write-behind.so
2007-10-19 20:42:56 W [xlator.c:156:xlator_set_type] libglusterfs/xlator:
dlsym(notify) on

[Gluster-devel] Infiniband

2007-10-16 Thread Nathan Allen Stratton

I am trying to setup infiniband on a Fedora Core 6 box, I found:

http://gluster-tmp.enix.org/docs/index.php/Configuring_Infiniband

However it looks be be outdated since the first SVN URL does not work
anymore. I also can't find gen2 on the openfabrics site. Does anyone have
any pointers on setting up infiniband on FC6?

It looks like my cards load just fine:

ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
ib_mthca: Initializing :06:00.0

-Nathan


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] infiniband failover to tcp

2007-09-26 Thread Nathan Allen Stratton
On Wed, 26 Sep 2007, Anand Avati wrote:

 Mickey,
  This will be possible with the HA xlator which does HA failover over its
 subvolumes, and thus can be configured to failover across servers AFR'd at
 the 'backend' (server side) or just failover two network links to the same
 server.

When is this scheduled to be in TLA?

-nathan


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] afr, striping, and nufa scheduler

2007-09-25 Thread Nathan Allen Stratton
On Tue, 25 Sep 2007, Krishna Srinivas wrote:

 Hi August,

 Actually the read scheduler is not yet done, it is on our roadmap to
 implement it. Now, AFR does all the read operations from its first
 child.

So if site A and B are AFR and site A is listed first both sites A and B
will try to do all reads from A?

-Nathan


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


[Gluster-devel] TLA 493 error

2007-09-25 Thread Nathan Allen Stratton

stripe.c: In function stripe_writev:
stripe.c:2858: error: expected expression before else
stripe.c:3144: error: expected declaration or statement at end of input
stripe.c:3140: warning: unused variable mops
stripe.c:3101: warning: unused variable fops
make[4]: *** [stripe.o] Error 1
make[4]: Leaving directory
`/usr/local/src/glusterfs/xlators/cluster/stripe/src'
make[3]: *** [install-recursive] Error 1
make[3]: Leaving directory
`/usr/local/src/glusterfs/xlators/cluster/stripe'
make[2]: *** [install-recursive] Error 1
make[2]: Leaving directory `/usr/local/src/glusterfs/xlators/cluster'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/usr/local/src/glusterfs/xlators'
make: *** [install-recursive] Error 1



Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


[Gluster-devel] Gluster ready for prime time with this setup?

2007-09-20 Thread Nathan Allen Stratton

Image of the setup is at:

http://share.robotics.net/gluster.png

The setup is as follows; eight servers each in NYC and LAX datacenters.
I have not decided on RIAD 5 or 6, but each server has 8 750 gig disks
connected to a 3ware controller, with each box connected to a switch over
3 bonded Gig E interfaces. The boxes boot off a SAN over a fiber channel
interface and each box is used as a storage server, client and XEN host.

The requirement is for three share exports, two that are unify shares for
NYC and LAX and a 3rd that is a AFR between NYC and LAX. The idea is that
we will store things that stay local in say /share-nyc or /share-lax and
store files that need to be replicated between sites in /share that is afr
of the two sites.

High availably is important, the plan is to always use servers in pairs,
server 0 for the brick and server 1 for the mirror of that data using AFR.
Each server pair forms a block, with 4 blocks (8 servers) forming each
city share unify. Data that needs to be in both NYC and LAX would actually
be stored 4 times, twice in each city. To keep reads local, I was planing
on using nfu forcing each city to prefer local reads.

Would this work? Is gluster ready for someting like this? One thing I have
not figured out is how to deal with name space, any ideas on that? Also,
what issues will I run into with AFR between two cities 3000 miles apart?
That latency is obviously going to be 60 ms more then between local
servers, does this cause a problems?


Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster ready for prime time with this setup?

2007-09-20 Thread Nathan Allen Stratton
On Fri, 21 Sep 2007, Michael Fincham wrote:

 Hi Nathan,

 Are you planning on booting Xen VMs off GlusterFS? I've been playing with
 this for about a month and come up against some issues... In short, it
 seems to work fine if AFR is not involved.

No, XEN dom0 and Us boot off the SAN.

-nathan


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd crash with glusterfs--mainline--2.5--patch-459

2007-08-17 Thread Nathan Allen Stratton

Crash with latest.

(gdb) bt
#0  0x2b309a40 in afr_close (frame=0x2aaab8002a10, this=value
optimized out, fd=0x2aaab40031a0)
at afr.c:2395
#1  0x2b51b951 in unify_close (frame=0x2aaab80041f0,
this=0x60b8d0, fd=0x2aaab40031a0) at unify.c:2421
#2  0x2b723c65 in iot_close_wrapper (frame=0x2aaab4005c00,
this=0x60c2a0, fd=0x2aaab40031a0)
at io-threads.c:185
#3  0x2aad91ca in call_resume (stub=0x2aaab8001960) at
call-stub.c:2715
#4  0x2b724f3d in iot_worker (arg=value optimized out) at
io-threads.c:1055
#5  0x003f610062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x003f604ce86d in clone () from /lib64/libc.so.6
#7  0x in ?? ()
(gdb)



Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


[Gluster-devel] 802.3ad

2007-08-16 Thread Nathan Allen Stratton

What sort of performance numbers are people seeing with gig ethernet, and
more specifically 802.3ad (link aggregation) of gig ethernet interfaces
while using Gluster?


Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] afr fault recovery

2007-08-10 Thread Nathan Allen Stratton
On Fri, 10 Aug 2007, Krishna Srinivas wrote:

 Hi Nathan,

 Give this option in client/protocol volume definition:
 option transport-timeout 30

Thanks, I was not using that option, so yes, that was my issue. I have it
set to 2 sec now. It works as indicated, but my applications still die
because they are not expecting to lose disk for 2 sec.


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd crash with glusterfs--mainline--2.5--patch-410

2007-08-07 Thread Nathan Allen Stratton

On Tue, 7 Aug 2007, Anand Avati wrote:

 Nathan,
 this (and a couple more) bugs are fixed in patch-436. thanks for reporting.

I am running patch-439, things are much better, I can shutdown a server
and when it comes back up I do not need to restart glusterd on the other
two boxes. However, when one server is shutdown the two active servers
still have a timeout delay of about 4 min where you can't read or write to
the share. Is there any plans to support a server going down without
losing access to the rest of the unity/afr? I think my application could
deal with a delay of a few sec, but several minutes just breaks
everything.



Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd crash with glusterfs--mainline--2.5--patch-410

2007-08-07 Thread Nathan Allen Stratton

Any downside in lowering this to 1 sec?



Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com

On Wed, 8 Aug 2007, Anand Avati wrote:

 Nathan,
  sorry about my last mail. that option should go into (all of) your
 protocol/client volumes. All the options of all the translators can be
 currently found in doc/translator-options.txt of the source tree.

 avati

 2007/8/8, Nathan Allen Stratton [EMAIL PROTECTED]:
 
 
  Where do I put this option? Is it in the client config, server config?
  Under the ARF or the Unify or both? Also, is there a way to see all
  options? I looked on the wiki and found some info, but have not found a
  page that shows all options for all the translators.
 
 
  
  Nathan Stratton CTO, Voila IP Communications
  nathan at robotics.net  nathan at voilaip.com
  http://www.robotics.net http://www.voilaip.com
 
  On Tue, 7 Aug 2007, Anand Avati wrote:
 
   Nathan,
you can fine tune this delay by 'option transport-timeout 30' which
  sets
   the response timeout to 30 seconds, or any value you want (in seconds).
   Please let us know if that made a difference for you.
  
   thanks,
   avati
  
  
   2007/8/7, Nathan Allen Stratton [EMAIL PROTECTED]:
   
   
On Tue, 7 Aug 2007, Anand Avati wrote:
   
 Nathan,
 this (and a couple more) bugs are fixed in patch-436. thanks for
reporting.
   
I am running patch-439, things are much better, I can shutdown a
  server
and when it comes back up I do not need to restart glusterd on the
  other
two boxes. However, when one server is shutdown the two active servers
still have a timeout delay of about 4 min where you can't read or
  write to
the share. Is there any plans to support a server going down without
losing access to the rest of the unity/afr? I think my application
  could
deal with a delay of a few sec, but several minutes just breaks
everything.
   
   

Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com
   
  
  
  
   --
   It always takes longer than you expect, even when you take into account
   Hofstadter's Law.
  
   -- Hofstadter's Law
  
 



 --
 It always takes longer than you expect, even when you take into account
 Hofstadter's Law.

 -- Hofstadter's Law



___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd crash with glusterfs--mainline--2.5--patch-410

2007-08-07 Thread Nathan Allen Stratton

I am using patch-439 and things are working just fine with the timeout.
The only issue is lots and lots of errors in the log file of the two boxes
that are not shutdown. The errors continue at a rate of about 100 a sec
even after the box that was shutdown is brought back up.


2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video0_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video0_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video3_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video3_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video3_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video3_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video3_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video3_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video2_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video2_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video2_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1
op_errno=77
2007-08-07 16:24:01 E [afr.c:1818:afr_writev_cbk] block-a-ds-afr:
(path=/vs0_video2_16_2007-08-07.rs child=mirror-a-ds) op_ret=-1 op_errno=7


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


[Gluster-devel] afr fault recovery

2007-08-06 Thread Nathan Allen Stratton

I am running TLA 430 with 3 nodes:
vs0 ns brick-a mirror-c
vs1 ns brick-b mirror-a
vs2 ns brick-c mirror-b

I afr replicate *:3 the ns bricks into block-ns-afr and afr replicate
*:2 each brick-(a-c) and mirror(a-c) with replicate *:2
into block-(a-c)-afr.

http://share.robotics.net/client.vol (same for vs0 - vs2)
http://share.robotics.net/vs0_server.vol
http://share.robotics.net/vs1_server.vol
http://share.robotics.net/vs2_server.vol

If I do a shutdown on any node the whole thing locks up. If I leave it
down, it does not recover. If I then bring it back up, it does not
recover. I have to restart glusterfsd on all boxes.

-Nathan


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] afr fault recovery

2007-08-06 Thread Nathan Allen Stratton
On Mon, 6 Aug 2007, Krishna Srinivas wrote:

 Hi Nathan,

 Can you confirm that you are not killing the server to which
 the client connects?

 According to the spec, client connects to server on 127.0.0.1:6996
 Which server are you running here(vs0/vs1/vs2)? What is the ip address of the
 server you are killing?

So each server also has a client, so that is why I used loopback. So if I
kill vs0 then server and client for vs0 are down. However clients on vs1
and vs2 also point to each localhost are still up. So they should still be
working, but are locked.

-Nathan


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] afr and self-heal issues

2007-08-04 Thread Nathan Allen Stratton
On Sat, 4 Aug 2007, Krishna Srinivas wrote:

  So if I wanted auto self help, I should just setup a cron to do that?

 No, just dont worry about it, whenever the file is opened by an
 applcation, it will be selfhealed. However if you are worried
 that the server which has the latest version might go down, you
 can run the above find command to make sure everything is
 in sync.

Right, so we will have a large number of files and some many not be access
frequently. Yes, I want to make sure everything is in sync.

-Nathan


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd crash with glusterfs--mainline--2.5--patch-410

2007-08-03 Thread Nathan Allen Stratton
On Fri, 3 Aug 2007, Anand Avati wrote:

 Nathan,
  we were checking this issue in our labs. the issue seems to be that one
 extra connect() is tried on the dead server, and the 'freezing' just happens
 to be the block till the first connect() times out. So, after a minute or
 so, once the first connect() times out, things proceed smooth. We are
 looking into why the first connect() is blocking.

Did some more testing with this, I shutdown the server and after a bit it
crashed.

(gdb) bt
#0  dict_destroy (this=0x622420) at dict.c:244
#1  0x2b93002b in server_reply_proc (data=value optimized out)
at server-protocol.c:255
#2  0x003f610062f7 in start_thread () from /lib64/libpthread.so.0
#3  0x003f604ce86d in clone () from /lib64/libc.so.6
#4  0x in ?? ()
(bdb)

After about 5 more min the 2nd box crashed:

(gdb) bt
#0  0x003f60430065 in raise () from /lib64/libc.so.6
#1  0x003f60431b00 in abort () from /lib64/libc.so.6
#2  0x003f6046825b in __libc_message () from /lib64/libc.so.6
#3  0x003f6046f504 in _int_free () from /lib64/libc.so.6
#4  0x003f60472b2c in free () from /lib64/libc.so.6
#5  0x2aace140 in dict_destroy (this=0x2aaab00012b0) at dict.c:250
#6  0x2b93002b in server_reply_proc (data=value optimized out) at 
server-protocol.c:255
#7  0x003f610062f7 in start_thread () from /lib64/libpthread.so.0
#8  0x003f604ce86d in clone () from /lib64/libc.so.6
#9  0x in ?? ()
(gdb)



___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


[Gluster-devel] afr and self-heal issues

2007-08-03 Thread Nathan Allen Stratton

My setup is 3 servers each with 3 volumes:

vs0 ns brick-a mirror-c
vs0 ns brick-b mirror-a
vs0 ns brick-c mirror-b

I afr replicate *:3 the ns bricks into block-ns-afr and afr replicate *:2
each brick-(a-c) and mirror(a-c) with replicate *:2 into block-(a-c)-afr.

I then unify block-(a-c)-afr into share-unify with option namesspace
block-ns-afr.

If a server goes down things lock up and then crash (known issue that the
gluster guys are working on). If I leave one server (vs0) off and restart
the crashed servers I can write to my share unify, I would expect files
going to block-b-afr to land in vs1 brick-b and vs2 mirror-b and that is
exactly what happens. Unify is using rr scheduler and as expected files
also are sent to block-c-afr. Server vs2 brick-c gets the block-c-afr
files, but the odd part is, so does vs1 mirror-a

Why would that happen? block-c-afr is made up of vs2 brick-c and vs0
(server that is down) mirror-c.

Also, when I bring back up vs0, I would expect ns to be brought back up to
date with the others since it was part of the afr *:3, but it is not. I
also would expect that files part of block-c-afr that are in vs2 brick-c
would also be copied to vs0 mirror-c, but that also does not happen.

Also, I was playing around with stripe, does it work in the latest code?
If I edit my configs and comment out my unify and replace it with stripe I
only get what looks like unify, but without the namespace requirements.
I.E. no matter what I put for block-size my files are still their normally
300 or so megs. Is the issue that I am using it server side rather then
client side?

Any ideas?

Full configs are at:

http://share.robotics.net/client.vol
http://share.robotics.net/vs0_server.vol
http://share.robotics.net/vs1_server.vol
http://share.robotics.net/vs2_server.vol




Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd crash with glusterfs--mainline--2.5--patch-410

2007-08-03 Thread Nathan Allen Stratton
On Fri, 3 Aug 2007, Anand Avati wrote:

 Is your 'tla tree-id' pre patch-422?

Yes, -416, I will upgrade and give it a try.

-Nathan


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd crash with glusterfs--mainline--2.5--patch-410

2007-08-03 Thread Nathan Allen Stratton
 On Fri, 3 Aug 2007, Anand Avati wrote:

 Is your 'tla tree-id' pre patch-422?

Crash from 425 when one server was down, also when it comes back up there
is no rebuild of afr or namespace.

[EMAIL PROTECTED]/glusterfs--mainline--2.5--patch-425

(gdb) bt
#0  dict_iovec_len (dict=0x2aaab401e130) at dict.c:546
#1  0x2aad496d in gf_block_iovec_len (blk=value optimized out)
at protocol.c:384
#2  0x2b92f411 in generic_reply (frame=value optimized out,
type=2, op=GF_FOP_WRITE, params=0x2aaab401e130) at server-protocol.c:182
#3  0x2b92ffa2 in server_reply_proc (data=value optimized out)
at server-protocol.c:248
#4  0x003f610062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x003f604ce86d in clone () from /lib64/libc.so.6
#6  0x in ?? ()
(gdb)



___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd crash with glusterfs--mainline--2.5--patch-410

2007-08-02 Thread Nathan Allen Stratton
On Fri, 3 Aug 2007, Anand Avati wrote:

 Nathan,
 patch-414 should fix these issues. I think the patch covers all possible
 cases which hits this bug. Please upgrade to it.

Also crashes with 414:

(vs0)

-
got signal (11), printing backtrace
-
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/lib64/libc.so.6[0x3f604300c0]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/usr/local/lib/glusterfs/1.3.0/xlator/protocol/server.so[0x2bb328a4]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/usr/local/lib/glusterfs/1.3.0/xlator/protocol/server.so[0x2bb32d63]
/usr/local/lib/glusterfs/1.3.0/xlator/performance/io-threads.so[0x2b92bfd5]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/usr/local/lib/glusterfs/1.3.0/xlator/cluster/unify.so(unify_writev_cbk+0x19)[0x2b71e3b9]
/usr/local/lib/glusterfs/1.3.0/xlator/cluster/afr.so[0x2b50de59]
/usr/local/lib/glusterfs/1.3.0/xlator/protocol/client.so[0x2b305b27]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/usr/local/lib/glusterfs/1.3.0/xlator/cluster/afr.so[0x2b5100c8]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/usr/local/lib/glusterfs/1.3.0/xlator/cluster/unify.so(unify_writev+0xf7)[0x2b722097]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/usr/local/lib/glusterfs/1.3.0/xlator/performance/io-threads.so[0x2b92c873]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/usr/local/lib/libglusterfs.so.0[0x2aad7e24]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/usr/local/lib/libglusterfs.so.0(call_resume+0x6a)[0x2aad829a]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
/usr/local/lib/glusterfs/1.3.0/xlator/performance/io-threads.so[0x2b92dded]
2007-08-02 14:56:39 D [tcp-client.c:70:tcp_connect] mirror-a-ds: socket fd
= 3
/lib64/libpthread.so.0[0x3f610062f7]
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed
2007-08-02 14:56:39 D [tcp-client.c:88:tcp_connect] mirror-a-ds: finalized
on port `1023'
/lib64/libc.so.6(clone+0x6d)[0x3f604ce86d]
2007-08-02 14:56:39 D [tcp-client.c:109:tcp_connect] mirror-a-ds:
defaulting remote-port to 6996
-
2007-08-02 14:56:39 E [server-protocol.c:197:generic_reply] server:
transport_writev failed


(vs1)

2007-08-02 14:56:43 D [tcp-client.c:142:tcp_connect] mirror-c-ds: connect
on 7 in progress (non-blocking)

-
got signal (11), printing backtrace
-
/lib64/libc.so.6[0x3f604300c0]
/usr/local/lib/glusterfs/1.3.0/xlator/protocol/server.so[0x2bb328a4]
/usr/local/lib/glusterfs/1.3.0/xlator/protocol/server.so[0x2bb32d63]
/usr/local/lib/glusterfs/1.3.0/xlator/performance/io-threads.so[0x2b92bfd5]
/usr/local/lib/glusterfs/1.3.0/xlator/cluster/unify.so(unify_writev_cbk+0x19)[0x2b71e3b9]
/usr/local/lib/glusterfs/1.3.0/xlator/cluster/afr.so[0x2b50de59]
/usr/local/lib/glusterfs/1.3.0/xlator/protocol/client.so[0x2b305b27]
/usr/local/lib/glusterfs/1.3.0/xlator/cluster/afr.so[0x2b5100c8]
/usr/local/lib/glusterfs/1.3.0/xlator/cluster/unify.so(unify_writev+0xf7)[0x2b722097]
/usr/local/lib/glusterfs/1.3.0/xlator/performance/io-threads.so[0x2b92c873]
/usr/local/lib/libglusterfs.so.0[0x2aad7e24]
/usr/local/lib/libglusterfs.so.0(call_resume+0x6a)[0x2aad829a]
/usr/local/lib/glusterfs/1.3.0/xlator/performance/io-threads.so[0x2b92dded]
/lib64/libpthread.so.0[0x3f610062f7]
/lib64/libc.so.6(clone+0x6d)[0x3f604ce86d]
-


(vs2)
2007-08-02 14:56:42 E [server-protocol.c:197:generic_reply] server:
transport_writev failed

-
got signal (11), printing backtrace
-
2007-08-02 14:56:42 W [client-protocol.c:4279:client_protocol_cleanup]
brick-b-ds: forced unwinding frame type(0) op(13)
2007-08-02 14:56:42 E [afr.c:1757:afr_readv_cbk] block-b-ds-afr:
(path=/vs2_video1_13_2007-08-02.rs child=brick-b-ds) op_ret=-1
op_errno=107
/lib64/libc.so.6[0x3f604300c0]
2007-08-02 14:56:42 E [tcp.c:118:tcp_except] server: shutdown () - error:
Transport endpoint is not connected
/usr/local/lib/glusterfs/1.3.0/xlator/protocol/server.so[0x2bb328a4]
/usr/local/lib/glusterfs/1.3.0/xlator/protocol/server.so[0x2bb32d63]
2007-08-02 14:56:42 W [client-protocol.c:4279:client_protocol_cleanup]
brick-b-ds: forced unwinding frame type(0) op(13)
/usr/local/lib/glusterfs/1.3.0/xlator/performance/io-threads.so[0x2b92bfd5]
2007-08-02 14:56:42 E [afr.c:1757:afr_readv_cbk] block-b-ds-afr:
(path=/vs2_video1_13_2007-08-02.rs child=brick-b-ds) op_ret=-1
op_errno=107
/usr/local/lib/glusterfs/1.3.0/xlator/cluster/unify.so(unify_writev_cbk+0x19)[0x2b71e3b9]
2007-08-02 

[Gluster-devel] Client lockup

2007-07-31 Thread Nathan Allen Stratton

I am running glusterfs-1.3.pre6 with 3 bricks and 3 clinets (on same box).
I am running AFR and UNIFY so that I have 2 copies of everything. If I go
to one of the 3 servers and kill glusterfsd everything works fine, I still
see all the data and clients are happy. If however I reboot one of the
servers the clients hang and I have to kill all the servers and clients
and restart them.

http://share.robotics.net/client.vol (same on all boxes)
http://share.robotics.net/server0.vol
http://share.robotics.net/server1.vol
http://share.robotics.net/server2.vol


Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com



___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Client lockup

2007-07-31 Thread Nathan Allen Stratton
On Wed, 1 Aug 2007, Anand Avati wrote:

 Nathan,
  some bugs have been fixed in the source repository which solves the issues
 you have faced. You could either checkout the source from the repository (
 http://www.gluster.org/download.php has instructions) or you could wait for
 the next release which is going to happen very shortly.

Will it also fix this problme:

http://share.robotics.net/glusterfsd.log



Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Client lockup

2007-07-31 Thread Nathan Allen Stratton

On Wed, 1 Aug 2007, Anand Avati wrote:

 Nathan,
  some bugs have been fixed in the source repository which solves the issues
 you have faced. You could either checkout the source from the repository (
 http://www.gluster.org/download.php has instructions) or you could wait for
 the next release which is going to happen very shortly.

Hmm, still have the issue when one of the servers is shutdown. I also ran
into a new bug that happens at load:

2007-07-31 18:07:51 E [afr.c:2168:afr_close_setxattr_cbk] block-a-ds-afr:
(path=/vs1_video1_18_2007-07-31.rs child=mirror-a-ds) op_ret=-1 op_errno=2
2007-07-31 18:07:51 E [afr.c:2168:afr_close_setxattr_cbk] block-a-ds-afr:
(path=/vs1_video1_18_2007-07-31.rs child=brick-a-ds) op_ret=-1 op_errno=2
2007-07-31 18:07:51 E [afr.c:2168:afr_close_setxattr_cbk] block-c-ds-afr:
(path=/vs1_video3_18_2007-07-31.rs child=mirror-c-ds) op_ret=-1 op_errno=2
2007-07-31 18:07:51 E [afr.c:2168:afr_close_setxattr_cbk] block-c-ds-afr:
(path=/vs1_video3_18_2007-07-31.rs child=brick-c-ds) op_ret=-1 op_errno=2
2007-07-31 18:07:51 E [afr.c:2168:afr_close_setxattr_cbk] block-b-ds-afr:
(path=/vs1_video2_18_2007-07-31.rs child=brick-b-ds) op_ret=-1 op_errno=2
2007-07-31 18:07:51 E [afr.c:2168:afr_close_setxattr_cbk] block-b-ds-afr:
(path=/vs1_video2_18_2007-07-31.rs child=mirror-b-ds) op_ret=-1 op_errno=2

-
got signal (6), printing backtrace
-
/lib64/libc.so.6[0x3f604300c0]
/lib64/libc.so.6(gsignal+0x35)[0x3f60430065]
/lib64/libc.so.6(abort+0x110)[0x3f60431b00]
/lib64/libc.so.6[0x3f6046825b]
/lib64/libc.so.6[0x3f6046f504]
/lib64/libc.so.6(cfree+0x8c)[0x3f60472b2c]
/usr/local/lib/libglusterfs.so.0(gf_block_unserialize_transport+0x3af)[0x2aad550f]
/usr/local/lib/glusterfs/1.3.pre6/xlator/protocol/client.so(notify+0x2a1)[0x2aef0ea1]
/usr/local/lib/libglusterfs.so.0(sys_epoll_iteration+0xc2)[0x2aad6212]
[glusterfsd](main+0x372)[0x401842]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3f6041d8a4]
[glusterfsd][0x401269]
-



___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Client lockup

2007-07-31 Thread Nathan Allen Stratton
On Wed, 1 Aug 2007, Anand Avati wrote:

 Nathan,
  some bugs have been fixed in the source repository which solves the issues
 you have faced. You could either checkout the source from the repository (
 http://www.gluster.org/download.php has instructions) or you could wait for
 the next release which is going to happen very shortly.

Not sure if this helps, but cores from the latest in repository:

-
got signal (11), printing backtrace
-
/lib64/libc.so.6[0x3f604300c0]
/usr/local/lib/glusterfs/1.3.pre6/xlator/protocol/server.so[0x2b9268a4]
/usr/local/lib/glusterfs/1.3.pre6/xlator/protocol/server.so[0x2b926d63]
/usr/local/lib/glusterfs/1.3.pre6/xlator/performance/io-threads.so[0x2b71ffd5]
/usr/local/lib/glusterfs/1.3.pre6/xlator/cluster/unify.so(unify_writev_cbk+0x19)[0x2b5123b9]
/usr/local/lib/glusterfs/1.3.pre6/xlator/cluster/afr.so[0x2b303f16]
/usr/local/lib/glusterfs/1.3.pre6/xlator/protocol/client.so[0x2b0f9a19]
/usr/local/lib/glusterfs/1.3.pre6/xlator/cluster/afr.so[0x2b306a67]
/usr/local/lib/glusterfs/1.3.pre6/xlator/cluster/unify.so(unify_writev+0xf7)[0x2b5163e7]
/usr/local/lib/glusterfs/1.3.pre6/xlator/performance/io-threads.so[0x2b720873]
/usr/local/lib/libglusterfs.so.0[0x2aad7d64]
/usr/local/lib/libglusterfs.so.0(call_resume+0x6a)[0x2aad81da]
/usr/local/lib/glusterfs/1.3.pre6/xlator/performance/io-threads.so[0x2b721ded]
/lib64/libpthread.so.0[0x3f610062f7]
/lib64/libc.so.6(clone+0x6d)[0x3f604ce86d]
-

2007-07-31 18:40:29 C [tcp.c:81:tcp_disconnect] brick-c-ns: connection
disconnected
2007-07-31 18:40:29 E [protocol.c:251:gf_block_unserialize_transport]
libglusterfs/protocol: EOF from peer (192.168.0.62:1023)
2007-07-31 18:40:29 C [tcp.c:81:tcp_disconnect] server: connection
disconnected
2007-07-31 18:40:29 E [protocol.c:251:gf_block_unserialize_transport]
libglusterfs/protocol: EOF from peer (192.168.0.62:6996)
2007-07-31 18:40:29 C [tcp.c:81:tcp_disconnect] brick-c-ds: connection
disconnected
2007-07-31 18:40:29 E [protocol.c:251:gf_block_unserialize_transport]
libglusterfs/protocol: EOF from peer (192.168.0.62:6996)
2007-07-31 18:40:29 C [tcp.c:81:tcp_disconnect] mirror-b-ds: connection
disconnected
2007-07-31 18:40:29 E [afr.c:1816:afr_writev_cbk] block-b-ds-afr:
(path=/vs1_video0_18_2007-07-31.rs child=mirror-b-ds) op_ret=-1
op_errno=77

-
got signal (11), printing backtrace
-
/lib64/libc.so.6[0x3f604300c0]
/usr/local/lib/glusterfs/1.3.pre6/xlator/protocol/server.so[0x2b9268a4]
/usr/local/lib/glusterfs/1.3.pre6/xlator/protocol/server.so[0x2b926d63]
/usr/local/lib/glusterfs/1.3.pre6/xlator/performance/io-threads.so[0x2b71ffd5]
/usr/local/lib/glusterfs/1.3.pre6/xlator/cluster/unify.so(unify_writev_cbk+0x19)[0x2b5123b9]
/usr/local/lib/glusterfs/1.3.pre6/xlator/cluster/afr.so[0x2b303f16]
/usr/local/lib/glusterfs/1.3.pre6/xlator/protocol/client.so[0x2aef3a19]
/usr/local/lib/glusterfs/1.3.pre6/xlator/cluster/afr.so[0x2b306a67]
/usr/local/lib/glusterfs/1.3.pre6/xlator/cluster/unify.so(unify_writev+0xf7)[0x2b5163e7]
/usr/local/lib/glusterfs/1.3.pre6/xlator/performance/io-threads.so[0x2b720873]
/usr/local/lib/libglusterfs.so.0[0x2aad7d64]
/usr/local/lib/libglusterfs.so.0(call_resume+0x6a)[0x2aad81da]
/usr/local/lib/glusterfs/1.3.pre6/xlator/performance/io-threads.so[0x2b721ded]
/lib64/libpthread.so.0[0x3f610062f7]
/lib64/libc.so.6(clone+0x6d)[0x3f604ce86d]
-
2007-07-31 18:40:29 E [protocol.c:251:gf_block_unserialize_transport]
libglusterfs/protocol: EOF from peer (10.10.0.61:6996)
2007-07-31 18:40:29 C [tcp.c:81:tcp_disconnect] share: connection
disconnected


got signal (6), printing backtrace
-
/lib64/libc.so.6[0x3f604300c0]
/lib64/libc.so.6(gsignal+0x35)[0x3f60430065]
/lib64/libc.so.6(abort+0x110)[0x3f60431b00]
/lib64/libc.so.6[0x3f6046825b]
/lib64/libc.so.6[0x3f6046f504]
/lib64/libc.so.6(cfree+0x8c)[0x3f60472b2c]
/usr/local/lib/libglusterfs.so.0(gf_block_unserialize_transport+0x3af)[0x2aad550f]
/usr/local/lib/glusterfs/1.3.pre6/xlator/protocol/client.so(notify+0x2a1)[0x2aef0ea1]
/usr/local/lib/libglusterfs.so.0(sys_epoll_iteration+0xc2)[0x2aad6212]
[glusterfsd](main+0x372)[0x401842]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3f6041d8a4]
[glusterfsd][0x401269]
-
2007-07-31 18:40:31 C [tcp.c:81:tcp_disconnect] share: connection
disconnected



___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GlusterFS Example

2007-07-30 Thread Nathan Allen Stratton
On Mon, 30 Jul 2007, Matt Paine wrote:

 Hi Nathan,

  http://www.gluster.org/docs/index.php/GlusterFS_High_Availability_Storage_with_GlusterFS
 
  If anyone hasthe time for a few questions:

 I will do my best :)

Thanks!

  1) In the past, my server configuration had only the local volume as in
  http://www.gluster.org/docs/index.php/GlusterFS_Configuration_Example_for_Four_Bricks
  Your config looks like each server has its volume and all the others.

 This setup (i'm assuming you are talking about the high availability
 storage with glusterfs article) is specifically for high availability,

I am referencing the first link:
http://www.gluster.org/docs/index.php/GlusterFS_High_Availability_Storage_with_GlusterFS

I understand that the 2nd link is talking about doing everything on one
server, my question is why in the first link that looks like it is talking
about 3 physical boxes and each config file lists all the volumes
(mailspool-santa2-ds, santa3-ds) where in the 2nd link a server.vol file
only has the local volume information.

 if a brick goes down then there needs to be redundancy built in to
 provide a fallback option. If it was a simple unify solution, and a
 brick goes down, then you will be missing files (hence, not highly
 available at all unless you can guarantee 100% uptime of all bricks
 under all circumstances - which as far as I know noone offers :)

 In that specific example, it has been chosen to provide every brick with
   a copy of the files. In this case two bricks could fail and the
 cluster would still be up and running :). Extending this to 10 bricks,
 then you could potentially have 9 bricks fail and it would still be
 working.

 Point being, GlusterFS is highly configurable, you can set it up any way
 you like, more redundancy (afr), or more space (unify) or both. Your choice.


  2) Why do you use loopback IP vs the public IP in the first example?

 The first example introduces itself with this paragraph

 8-
 In this article, I will show you how to configure GlusterFS to simulate
 a cluster with:

  * one single mount point
  * 4 storage nodes serving the files

 I said simulate since we will deploy such a cluster on only one
 computer, but you can as well set it up over physical different
 computers by changing the IP addresses.
 -8

 Its only a simulation, run on one computer. If you wish to use the
 example over an actual cluster of computers, you will need to change the
 IP numbers accordingly.

Understood, however in the fist link the example has the first brick as a
127 address and the other 2 bricks as 192.168.252 address. My guess is
that it was a typo since when I looked at the page this morning the 172
address was corrected to be 192.168.242.2, consistent with the rest of the
example.

  3) Can I change option replicate *:3 to *:2 to just have two copies? I
  like having an extra copy, but in my application having 3 or 3 note or 4
  for 4 node is a bit overkill.

 Yes. As I said its highly configurable. You can even choose to replicate
 only certain files. E.g.

 option replicate *.tmp:1,*.html:4,*.php:4,*.db:1,*:2

 In this case any .tmp or .db files will not get replicated (will sit on
 the first afr brick only), all html and php files will be mirrored
 across the first four bricks of afr, and all other files will be
 mirrorred across the first two bricks. (note all on one line, note comma
 delimited, note also no spaces before or after the comma).

Right, but is it possible to just tell the config you want to store 2
copies of everything without telling each system where the redundant
copies should be stored? I.E. I am looking for something like RAID 6,
where I have say 8 bricks with two copies of every file. If I just enter
*:2 then the spare copies will just be on the first two bricks and since
each 8 brick has the same amount of storage I will run out of space on the
first two before the others vs the redundant copies distributed over the 8
bricks.

  4) In my application the server is also the client, in my current config
  when one out of 3 servers is done I can no longer write, even tho I am
  telling my config to write to the local brick. Is there any way to fix
  this? If a note is down I understand how I would not have access to the
  files, but I don't want to take down the full cluster.

 I'm note sure sorry. Maybe one of the devs can clarify that. I know
 there was an issue with an earlier relase to do with AFR. When the first
 brick was down then afr was no longer available.

Ah... That could be it.

  P.S. Our setup consists of 3 servers each with a 4 port video card. They
  continually capture video from the 4 inputs and save them in /share. Each
  box also acts as a video server accepting connections and reading any file
  in /share. I don't care much about redundancy, but I do care that /share
  has files from all servers, or all servers that are up 

Re: [Gluster-devel] GlusterFS Example

2007-07-30 Thread Nathan Allen Stratton
On Tue, 31 Jul 2007, Matt Paine wrote:

 Absolutly, except its there is a bit more work to do. AFR works by
 starting at the first brick and any replicating goes to the second,
 third etc as described above. But these are normal gluster bricks, so
 there is nothing stopping you from creating an AFR for the first brick,
 and AFR for the second, and just a posix brick for the rest (if thats
 what you want of course). Let me try to explain.

Wow, you should put this on the wiki, this is a very clear example. I can
see how this can work, but does it need to be that hard to setup? How hard
would it be to add functionality to AFR where you only need to specify the
number of copies you want and the scheduler takes care of the rest?

 Server one has one brick: S1
 Server two has one brick: S2
 Server three has one brick: S3
 Server four has one brick: S4
 Server five has one brick: S5

 Client unifies S1 and S2 (call him U1)
 Client unifies S3 and S4 (call him U2)
 Client then AFR's together U1 and U2 and S5

 In this example, if I save a file and its *:3, it will appear
 * on U1 (either S1 or S2 depending on scheduler)
 * on U2 (either S3 or S4 depending on sched.)
 * on S5

 If I save a file as only *:1, it will appear only on U1 (either S1 or S2
 depending on scheduler).

 Ad nausium.


 Of coures there is nothing stopping you from unifying three bricks, or
 even unifying an afr to afr.


 i.e. (might need a mono spaced font to see correctly...)

I use Pine (old text based mail reader) so it looks great!

  A5
  +---+
  |   |
  U1  U2
   +-+   +-+
   | |   | |
   A1A2  A3A4
   +---+---+   +---+---+---+   +---+   +---+---+
   |   |   |   |   |   |   |   |   |   |   |   |
 S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12


 Where Sxx = Server bricks
 Ax = AFR brick
 Ux = Unify brick



 So in this configuration (which you have already worked out i'm sure)
 that if you save something *:2, then it will appear in both U1 and U2,
 which means (depending on the spec from a[1-4], assume *:2) it will
 appear in either A1 or A2 (because of the unify), AND it will also
 appear in either A3 or A4. etc etc etc.

 I think i've laboured the point far enough :)

Thanks, it was great!

 Which version of glusterfs are you using? tla, pre ?

 (that issue has been fixed for a little while now, so if your using pre6
 you shoulnd't have come across it)

Yep, that was it, I was using 1.2, latest 1.3 fixes the problme.



Nathan Stratton CTO, Voila IP Communications
nathan at robotics.net  nathan at voilaip.com
http://www.robotics.net http://www.voilaip.com


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel