Re: [Gluster-users] Replicated striped data lose

David Gossage Sun, 13 Mar 2016 10:22:51 -0700

On Sun, Mar 13, 2016 at 11:07 AM, Mahdi Adnan <[email protected]
> wrote:


> My HBAs are LSISAS1068E, and the filesystem is XFS.
> I tried EXT4 and it did not help.
> I have created a stripted volume in one server with two bricks, same issue.
> and i tried a replicated volume with just "sharding enabled" same issue,
> as soon as i disable the sharding it works just fine, niether sharding nor
> striping works for me.
> i did follow up with some of threads in the mailing list and tried some of
> the fixes that worked with the others, none worked for me. :(
>

Is it possible the LSI has write-cache enabled?




> On 03/13/2016 06:54 PM, David Gossage wrote:
>
>
>
>
> On Sun, Mar 13, 2016 at 8:16 AM, Mahdi Adnan <
> [email protected]> wrote:
>
>> Okay so i have enabled shard in my test volume and it did not help,
>> stupidly enough, i have enabled it in a production volume
>> "Distributed-Replicate" and it currpted  half of my VMs.
>> I have updated Gluster to the latest and nothing seems to be changed in
>> my situation.
>> below the info of my volume;
>>
>
> I was pointing at the settings in that email as an example for corruption
> fixing. I wouldn't recommend enabling sharding if you haven't gotten the
> base working yet on that cluster. What HBA's are you using and what is
> layout of filesystem for bricks?
>
>
>> Number of Bricks: 3 x 2 = 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfs001:/bricks/b001/vmware
>> Brick2: gfs002:/bricks/b004/vmware
>> Brick3: gfs001:/bricks/b002/vmware
>> Brick4: gfs002:/bricks/b005/vmware
>> Brick5: gfs001:/bricks/b003/vmware
>> Brick6: gfs002:/bricks/b006/vmware
>> Options Reconfigured:
>> performance.strict-write-ordering: on
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> network.remote-dio: enable
>> performance.stat-prefetch: disable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> cluster.eager-lock: enable
>> features.shard-block-size: 16MB
>> features.shard: on
>> performance.readdir-ahead: off
>>
>>
>> On 03/12/2016 08:11 PM, David Gossage wrote:
>>
>>
>> On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan <
>> <[email protected]>[email protected]> wrote:
>>
>>> Both servers have HBA no RAIDs and i can setup a replicated or
>>> dispensers without any issues.
>>> Logs are clean and when i tried to migrate a vm and got the error,
>>> nothing showed up in the logs.
>>> i tried mounting the volume into my laptop and it mounted fine but, if i
>>> use dd to create a data file it just hang and i cant cancel it, and i cant
>>> unmount it or anything, i just have to reboot.
>>> The same servers have another volume on other bricks in a distributed
>>> replicas, works fine.
>>> I have even tried the same setup in a virtual environment (created two
>>> vms and install gluster and created a replicated striped) and again same
>>> thing, data corruption.
>>>
>>
>> I'd look through mail archives for a topic "Shard in Production" I think
>> it's called.  The shard portion may not be relevant but it does discuss
>> certain settings that had to be applied with regards to avoiding corruption
>> with VM's.  You may want to try and disable the  performance.readdir-ahead
>> also.
>>
>>
>>>
>>> On 03/12/2016 07:02 PM, David Gossage wrote:
>>>
>>>
>>>
>>> On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan <
>>> <[email protected]>[email protected]> wrote:
>>>
>>>> Thanks David,
>>>>
>>>> My settings are all defaults, i have just created the pool and started
>>>> it.
>>>> I have set the settings as your recommendation and it seems to be the
>>>> same issue;
>>>>
>>>> Type: Striped-Replicate
>>>> Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 x 2 = 4
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: gfs001:/bricks/t1/s
>>>> Brick2: gfs002:/bricks/t1/s
>>>> Brick3: gfs001:/bricks/t2/s
>>>> Brick4: gfs002:/bricks/t2/s
>>>> Options Reconfigured:
>>>> performance.stat-prefetch: off
>>>> network.remote-dio: on
>>>> cluster.eager-lock: enable
>>>> performance.io-cache: off
>>>> performance.read-ahead: off
>>>> performance.quick-read: off
>>>> performance.readdir-ahead: on
>>>>
>>>
>>>
>>> Is their a raid controller perhaps doing any caching?
>>>
>>> In the gluster logs any errors being reported during migration process?
>>> Since they aren't in use yet have you tested making just mirrored bricks
>>> using different pairings of servers two at a time to see if problem follows
>>> certain machine or network ports?
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 03/12/2016 03:25 PM, David Gossage wrote:
>>>>
>>>>
>>>>
>>>> On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan <
>>>> <[email protected]>[email protected]> wrote:
>>>>
>>>>> Dears,
>>>>>
>>>>> I have created a replicated striped volume with two bricks and two
>>>>> servers but I can't use it because when I mount it in ESXi and try to
>>>>> migrate a VM to it, the data get corrupted.
>>>>> Is any one have any idea why is this happening ?
>>>>>
>>>>> Dell 2950 x2
>>>>> Seagate 15k 600GB
>>>>> CentOS 7.2
>>>>> Gluster 3.7.8
>>>>>
>>>>> Appreciate your help.
>>>>>
>>>>
>>>> Most reports of this I have seen end up being settings related.  Post
>>>> gluster volume info. Below is what I have seen as most common recommended
>>>> settings.
>>>> I'd hazard a guess you may have some the read ahead cache or prefetch
>>>> on.
>>>>
>>>> quick-read=off
>>>> read-ahead=off
>>>> io-cache=off
>>>> stat-prefetch=off
>>>> eager-lock=enable
>>>> remote-dio=on
>>>>
>>>>>
>>>>> Mahdi Adnan
>>>>> System Admin
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> <[email protected]>[email protected]
>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replicated striped data lose

Reply via email to