Hi Wade,
There seems to be some issue in syncing the existing data in the volume
using Xsync crawl.
( To give some background: When geo-rep is started it goes to filesystem
crawl(Xsync) and sync all the data to slave, and then the session
switches to CHANGELOG mode).
We are looking in to this.
Any specific reason to go for Stripe volume? This seems to be not
extensively tested with geo-rep.
Thanks,
Saravana
On 10/19/2015 08:24 AM, Wade Fitzpatrick wrote:
The relevant portions of the log appear to be as follows. Everything
seemed fairly normal (though quite slow) until
[2015-10-08 15:31:26.471216] I
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished
hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:34.39248] I
[syncdutils(/data/gluster1/static/brick1):220:finalize] <top>: exiting.
[2015-10-08 15:31:34.40934] I [repce(agent):92:service_loop]
RepceServer: terminating on reaching EOF.
[2015-10-08 15:31:34.41220] I [syncdutils(agent):220:finalize] <top>:
exiting.
[2015-10-08 15:31:35.615353] I [monitor(monitor):362:distribute]
<top>: slave bricks: [{'host': 'palace', 'dir':
'/data/gluster1/static/brick1'}, {'host': 'madonna', 'dir'
: '/data/gluster1/static/brick2'}]
[2015-10-08 15:31:35.616558] I [monitor(monitor):383:distribute]
<top>: worker specs: [('/data/gluster1/static/brick1',
'ssh://root@palace:gluster://localhost:static', 1)]
[2015-10-08 15:31:35.748434] I [monitor(monitor):221:monitor] Monitor:
------------------------------------------------------------
[2015-10-08 15:31:35.748775] I [monitor(monitor):222:monitor] Monitor:
starting gsyncd worker
[2015-10-08 15:31:35.837651] I [changelogagent(agent):75:__init__]
ChangelogAgent: Agent listining...
[2015-10-08 15:31:35.841150] I
[gsyncd(/data/gluster1/static/brick1):649:main_i] <top>: syncing:
gluster://localhost:static -> ssh://root@palace:gluster://localhost:static
[2015-10-08 15:31:38.543379] I
[master(/data/gluster1/static/brick1):83:gmaster_builder] <top>:
setting up xsync change detection mode
[2015-10-08 15:31:38.543802] I
[master(/data/gluster1/static/brick1):401:__init__] _GMaster: using
'tar over ssh' as the sync engine
[2015-10-08 15:31:38.544673] I
[master(/data/gluster1/static/brick1):83:gmaster_builder] <top>:
setting up xsync change detection mode
[2015-10-08 15:31:38.544924] I
[master(/data/gluster1/static/brick1):401:__init__] _GMaster: using
'tar over ssh' as the sync engine
[2015-10-08 15:31:38.546163] I
[master(/data/gluster1/static/brick1):83:gmaster_builder] <top>:
setting up xsync change detection mode
[2015-10-08 15:31:38.546406] I
[master(/data/gluster1/static/brick1):401:__init__] _GMaster: using
'tar over ssh' as the sync engine
[2015-10-08 15:31:38.548989] I
[master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync
temp directory:
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549267] I
[master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync
temp directory:
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549467] I
[master(/data/gluster1/static/brick1):1220:register] _GMaster: xsync
temp directory:
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync
[2015-10-08 15:31:38.549632] I
[resource(/data/gluster1/static/brick1):1432:service_loop] GLUSTER:
Register time: 1444278698
[2015-10-08 15:31:38.582277] I
[master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary
master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.584099] I
[master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl
interval: 60 seconds
[2015-10-08 15:31:38.587405] I
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting
hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.588735] I
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished
hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:31:38.590116] I
[master(/data/gluster1/static/brick1):530:crawlwrap] _GMaster: primary
master with volume id 3f9f810d-a988-4914-a5ca-5bd7b251a273 ...
[2015-10-08 15:31:38.591582] I
[master(/data/gluster1/static/brick1):539:crawlwrap] _GMaster: crawl
interval: 60 seconds
[2015-10-08 15:31:38.593844] I
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting
hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:31:38.594832] I
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished
hybrid crawl syncing, stime: (1444278018, 482251)
[2015-10-08 15:32:38.641908] I
[master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1
crawls, 0 turns
[2015-10-08 15:32:38.644370] I
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting
hybrid crawl..., stime: (1444278018, 482251)
[2015-10-08 15:32:39.646733] I
[master(/data/gluster1/static/brick1):1252:crawl] _GMaster: processing
xsync changelog
/var/lib/misc/glusterfsd/static/ssh%3A%2F%2Froot%40palace%3Agluster%3A%2F%2F127.0.0.1%3Astatic/5f45950b672b0d32fa97e00350eca862/xsync/XSYNC-CHANGELOG.1444278758
[2015-10-08 15:32:40.857084] W
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
ENTRY FAILED: ({'uid': 0, 'gfid':
'fc446c88-a5b7-468b-ac52-25b4225fe0cf', 'gid': 0, 'mode': 33188,
'entry':
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-1.html',
'op': 'MKNOD'}, 17, '02489235-13c5-4232-8d6d-c7843bc5249b')
[2015-10-08 15:32:40.858580] W
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
ENTRY FAILED: ({'uid': 0, 'gfid':
'e08813c5-055a-4354-94ec-f1b41a14b2a4', 'gid': 0, 'mode': 33188,
'entry':
'.gfid/e63d740c-ce17-4107-af35-d4030d2ae466/formguide-2015-10-08-gosford-2.html',
'op': 'MKNOD'}, 17, '0abae047-5816-4199-8203-fa8b974dfef5')
...
[2015-10-08 15:33:38.236779] W
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
ENTRY FAILED: ({'uid': 1000, 'gfid':
'a41a2ac7-8fec-46bd-a4cc-8d8794e5ee39', 'gid
': 1000, 'mode': 33206, 'entry':
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/1PYhnxMyMMcQo8ukuyMsqq.png',
'op': 'MKNOD'}, 17, 'e047db7d-f96c-496f-8a83-5db8e41859ca')
[2015-10-08 15:33:38.237443] W
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
ENTRY FAILED: ({'uid': 1000, 'gfid':
'507f77db-0dc0-4d7f-9eb3-8f56b3e01765', 'gid
': 1000, 'mode': 33206, 'entry':
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/17H7rpUIXGEQemM0wCoy6c.png',
'op': 'MKNOD'}, 17, 'ee7fa964-fc92-4008-b38a-e790fbbb1285')
[2015-10-08 15:33:38.238053] W
[master(/data/gluster1/static/brick1):803:log_failures] _GMaster:
ENTRY FAILED: ({'uid': 1000, 'gfid':
'6c495557-6808-4ff9-98de-39afbbeeac82', 'gid
': 1000, 'mode': 33206, 'entry':
'.gfid/553f4651-9c13-4788-89ea-67e1a6d5ab43/3T3VvUQH44my0Eosiieeok.png',
'op': 'MKNOD'}, 17, 'cc6a75c4-0817-497e-912b-4442fd19db83')
[2015-10-08 15:33:43.615427] W
[master(/data/gluster1/static/brick1):1010:process] _GMaster:
changelogs XSYNC-CHANGELOG.1444278758 could not be processed - moving
on...
[2015-10-08 15:33:43.616425] W
[master(/data/gluster1/static/brick1):1014:process] _GMaster: SKIPPED
GFID =
6c495557-6808-4ff9-98de-39afbbeeac82,16f94158-2f27-421b-9981-94d4197b2b3b,53d01d46-5724-4c77-846f-aacea7a3a447,9fbb536b-b7c6-41e1-8593-43e8a42b3fbe,1923ceff-d9a4-449e-b1c6-ce37c54d242c,3206332f-ed48-48d7-ad3f-cb82fbda0695,7696c570-edd5-481e-8cdc-3e...[truncated]
That type of entry repeats until
[2015-10-09 11:12:22.590574] I
[master(/data/gluster1/static/brick1):1249:crawl] _GMaster: finished
hybrid crawl syncing, stime: (1444349280, 617969)
[2015-10-09 11:13:22.650285] I
[master(/data/gluster1/static/brick1):552:crawlwrap] _GMaster: 1
crawls, 1 turns
[2015-10-09 11:13:22.653459] I
[master(/data/gluster1/static/brick1):1242:crawl] _GMaster: starting
hybrid crawl..., stime: (1444349280, 617969)
[2015-10-09 11:13:22.670430] W
[master(/data/gluster1/static/brick1):1366:Xcrawl] _GMaster: irregular
xtime for
./racesoap/nominations/processed/.processed.2015-10-13.T.Ballina.V1.nomination.1444346457.247.Thj1Ly:
ENOENT
and then there were no more logs until 2015-10-13.
Thanks,
Wade.
On 16/10/2015 4:33 pm, Aravinda wrote:
Oh ok. I overlooked the status output. Please share the
geo-replication logs from "james" and "hilton" nodes.
regards
Aravinda
On 10/15/2015 05:55 PM, Wade Fitzpatrick wrote:
Well I'm kind of worried about the 3 million failures listed in the
FAILURES column, the timestamp showing that syncing "stalled" 2 days
ago and the fact that only half of the files have been transferred
to the remote volume.
On 15/10/2015 9:27 pm, Aravinda wrote:
Status looks good. Two master bricks are Active and participating
in syncing. Please let us know the issue you are observing.
regards
Aravinda
On 10/15/2015 11:40 AM, Wade Fitzpatrick wrote:
I have twice now tried to configure geo-replication of our
Stripe-Replicate volume to a remote Stripe volume but it always
seems to have issues.
root@james:~# gluster volume info
Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: 5f446a10-651b-4ce0-a46b-69871f498dbc
Status: Started
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: james:/data/gluster1/geo-rep-meta/brick
Brick2: cupid:/data/gluster1/geo-rep-meta/brick
Brick3: hilton:/data/gluster1/geo-rep-meta/brick
Brick4: present:/data/gluster1/geo-rep-meta/brick
Options Reconfigured:
performance.readdir-ahead: on
Volume Name: static
Type: Striped-Replicate
Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: james:/data/gluster1/static/brick1
Brick2: cupid:/data/gluster1/static/brick2
Brick3: hilton:/data/gluster1/static/brick3
Brick4: present:/data/gluster1/static/brick4
Options Reconfigured:
auth.allow: 10.x.*
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
root@palace:~# gluster volume info
Volume Name: static
Type: Stripe
Volume ID: 3de935db-329b-4876-9ca4-a0f8d5f184c3
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: palace:/data/gluster1/static/brick1
Brick2: madonna:/data/gluster1/static/brick2
Options Reconfigured:
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
root@james:~# gluster vol geo-rep static ssh://gluster-b1::static
status detail
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER
SLAVE SLAVE NODE STATUS CRAWL
STATUS LAST_SYNCED ENTRY DATA META FAILURES
CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
james static /data/gluster1/static/brick1 root
ssh://gluster-b1::static palace Active Changelog Crawl
2015-10-13 14:23:20 0 0 0 1952064 N/A
N/A N/A
hilton static /data/gluster1/static/brick3 root
ssh://gluster-b1::static palace Active Changelog Crawl
N/A 0 0 0 1008035
N/A N/A N/A
present static /data/gluster1/static/brick4 root
ssh://gluster-b1::static madonna Passive N/A
N/A N/A N/A N/A N/A N/A
N/A N/A
cupid static /data/gluster1/static/brick2 root
ssh://gluster-b1::static madonna Passive N/A
N/A N/A N/A N/A N/A N/A
N/A N/A
So just to clarify, data is striped over bricks 1 and 3; bricks 2
and 4 are the replica.
Can someone help me diagnose the problem and find a solution?
Thanks in advance,
Wade.
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users