Re: [Ocfs2-devel] [PATCH v2 4/4] ocfs2: check/fix inode block for online file check

2015-11-24 Thread Mark Fasheh
Hi Junxiao,

On Tue, Nov 03, 2015 at 03:12:35PM +0800, Junxiao Bi wrote:
> Hi Gang,
> 
> This is not like a right patch.
> First, online file check only checks inode's block number, valid flag,
> fs generation value, and meta ecc. I never see a real corruption
> happened only on this field, if these fields are corrupted, that means
> something bad may happen on other place. So fix this field may not help
> and even cause corruption more hard.

I agree that these are rather uncommon, we might even consider removing the
VALID_FL fixup. I definitely don't think we're ready for anything more
complicated than this though either. We kind of have to start somewhere too.


> Second, the repair way is wrong. In
> ocfs2_filecheck_repair_inode_block(), if these fields in disk don't
> match the ones in memory, the ones in memory are used to update the disk
> fields. The question is how do you know these field in memory are
> right(they may be the real corrupted ones)?

Your second point (and the last part of your 1st point) makes a good
argument for why this shouldn't happen automatically. Some of these
corruptions might require a human to look at the log and decide what to do.
Especially as you point out, where we might not know where the source of the
corruption is. And if the human can't figure it out, then it's probably time
to unmount and fsck.

Thanks,
--Mark

--
Mark Fasheh

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 3/4] ocfs2: create/remove sysfile for online file check

2015-11-24 Thread Mark Fasheh
On Wed, Oct 28, 2015 at 02:26:00PM +0800, Gang He wrote:
> Create online file check sysfile when ocfs2 mount,
> remove the related sysfile when ocfs2 umount.
> 
> Signed-off-by: Gang He 
Reviewed-by: Mark Fasheh 
--Mark

--
Mark Fasheh

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 2/4] ocfs2: sysfile interfaces for online file check

2015-11-24 Thread Mark Fasheh
On Tue, Nov 03, 2015 at 04:20:27PM +0800, Junxiao Bi wrote:
> Hi Gang,
> 
> On 11/03/2015 03:54 PM, Gang He wrote:
> > Hi Junxiao,
> > 
> > Thank for your reviewing.
> > Current design, we use a sysfile as a interface to check/fix a file (via 
> > pass a ino number).
> > But, this operation is manually triggered by user, instead of automatically 
> >  fix in the kernel.
> > Why?
> > 1) we should let users make this decision, since some users do not want to 
> > fix when encountering a file system corruption, maybe they want to keep the 
> > file system unchanged for a further investigation.
> If user don't want this, they should not use error=continue option, let
> fs go after a corruption is very dangerous.

Maybe we need another errors=XXX flag (maybe errors=fix)?

You both make good points, here's what I gather from the conversation:

 - Some customers would be sad if they have to manually fix corruptions.
   This takes effort on their part, and if the FS can handle it
   automatically, it should.

 - There are valid concerns that automatically fixing things is a change in
   behavior that might not be welcome, or worse might lead to unforseeable
   circumstances.

 - I will add that fixing things automatically implies checking them
   automatically which could introduce some performance impact depending on
   how much checking we're doing.

So if the user wants errors to be fixed automatically, they could mount with
errros=fix, and everyone else would have no change in behavior unless they
wanted to make use of the new feature.


> > 2) frankly speaking, this feature will probably bring a second corruption 
> > if there is some error in the code, I do not suggest to use automatically 
> > fix by default in the first version.
> I think if this feature could bring more corruption, then this should be
> fixed first.

Btw, I am pretty sure that Gang is referring to the feature being new and
thus more likely to have problems. There is nothing I see in here that is
file system corrupting.
--Mark


--
Mark Fasheh

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 1/4] ocfs2: export ocfs2_kset for online file check

2015-11-24 Thread Mark Fasheh
On Wed, Oct 28, 2015 at 02:25:58PM +0800, Gang He wrote:
> Export ocfs2_kset object from ocfs2_stackglue kernel module,
> then online file check code will create the related sysfiles
> under ocfs2_kset object.
> 
> Signed-off-by: Gang He 
Reviewed-by: Mark Fasheh 

--
Mark Fasheh

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 2/4] ocfs2: sysfile interfaces for online file check

2015-11-24 Thread Srinivas Eeda
On 11/24/2015 01:46 PM, Mark Fasheh wrote:
> On Tue, Nov 03, 2015 at 04:20:27PM +0800, Junxiao Bi wrote:
>> Hi Gang,
>>
>> On 11/03/2015 03:54 PM, Gang He wrote:
>>> Hi Junxiao,
>>>
>>> Thank for your reviewing.
>>> Current design, we use a sysfile as a interface to check/fix a file (via 
>>> pass a ino number).
>>> But, this operation is manually triggered by user, instead of automatically 
>>>  fix in the kernel.
>>> Why?
>>> 1) we should let users make this decision, since some users do not want to 
>>> fix when encountering a file system corruption, maybe they want to keep the 
>>> file system unchanged for a further investigation.
>> If user don't want this, they should not use error=continue option, let
>> fs go after a corruption is very dangerous.
> Maybe we need another errors=XXX flag (maybe errors=fix)?
Great idea Mark! I think adding errors=fix would be a good way to 
address both concerns :) It gives some control if anyone is 
uncomfortable of things getting checked/fixed automatically.

>
> You both make good points, here's what I gather from the conversation:
>
>   - Some customers would be sad if they have to manually fix corruptions.
> This takes effort on their part, and if the FS can handle it
> automatically, it should.
>
>   - There are valid concerns that automatically fixing things is a change in
> behavior that might not be welcome, or worse might lead to unforseeable
> circumstances.
>
>   - I will add that fixing things automatically implies checking them
> automatically which could introduce some performance impact depending on
> how much checking we're doing.
>
> So if the user wants errors to be fixed automatically, they could mount with
> errros=fix, and everyone else would have no change in behavior unless they
> wanted to make use of the new feature.
>
>
>>> 2) frankly speaking, this feature will probably bring a second corruption 
>>> if there is some error in the code, I do not suggest to use automatically 
>>> fix by default in the first version.
>> I think if this feature could bring more corruption, then this should be
>> fixed first.
> Btw, I am pretty sure that Gang is referring to the feature being new and
> thus more likely to have problems. There is nothing I see in here that is
> file system corrupting.
>   --Mark
>
>
> --
> Mark Fasheh
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH 1/1] ocfs2: return non-zero st_blocks for inline data

2015-11-24 Thread John Haxby
Some versions of tar assume that files with st_blocks == 0 do not
contain any data and will skip reading them entirely. See also
commit 9206c561554c ("ext4: return non-zero st_blocks for inline data").

Signed-off-by: John Haxby 
---
 fs/ocfs2/file.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 0e5b451..d631279 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1302,6 +1302,14 @@ int ocfs2_getattr(struct vfsmount *mnt,
}
 
generic_fillattr(inode, stat);
+   /*
+* If there is inline data in the inode, the inode will normally not
+* have data blocks allocated (it may have an external xattr block).
+* Report at least one sector for such files, so tools like tar, rsync,
+* others don't incorrectly think the file is completely sparse.
+*/
+   if (unlikely(OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL))
+   stat->blocks += (stat->size + 511)>>9;
 
/* We set the blksize from the cluster size for performance */
stat->blksize = osb->s_clustersize;
-- 
2.5.0


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH 0/1] ocfs2: return non-zero st_blocks for inline data [resend2]

2015-11-24 Thread John Haxby
Hello All,

[Really sorry about this and I hope you're not getting fed up of
 multiple copies of this message but the list on oss.oracle.com really
 doesn't like me.]

Some programs, and programmers, assume that if a file is occupying
zero blocks (st_blocks == 0) then it contains no data and there's no
point in reading it.  Posix doesn't actually say anything about this,
but it seems to be something a lot of people expect. Indeed, ext4,
btrfs and ntfs-3d all seem to behave this way so that no one[1] has
any unpleasant surprises.

This patch is almost exactly the same as commit 9206c561554c ("ext4:
return non-zero st_blocks for inline data") although I couldn't bring
myself to include the typo in the comment :)

jch

[resend because rejected by list the first time.]
[1] tar, I'm looking at you, but you're not the only one.


John Haxby (1):
  ocfs2: return non-zero st_blocks for inline data

 fs/ocfs2/file.c | 8 
 1 file changed, 8 insertions(+)

-- 
2.5.0


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 2/4] ocfs2: sysfile interfaces for online file check

2015-11-24 Thread Mark Fasheh
On Wed, Oct 28, 2015 at 02:25:59PM +0800, Gang He wrote:
> Implement online file check sysfile interfaces, e.g.
> how to create the related sysfile according to device name,
> how to display/handle file check request from the sysfile.
> 
> Signed-off-by: Gang He 

FYI, This looks generally fine to me however we should address Junxiao's 
concerns
before it goes any further.
--Mark

--
Mark Fasheh

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: fix BUG due to uncleaned localalloc during mount

2015-11-24 Thread Joseph Qi
Hi Changkuo,
Yes, it's safe.
ocfs2_begin_local_alloc_recovery will firstly copy it out and then
assign to osb->local_alloc_copy, which will be queued later in
ocfs2_queue_recovery_completion. And ocfs2_complete_local_alloc_recovery
will sync the local alloc to main, which is the same thing you have
pointed out in fsck.
Local alloc bits have already been allocated from global bitmap and it
means there is no any conflicts with others. Even crash happens directly
after ocfs2_begin_local_alloc_recovery, the only consequence is some
bits will be lost, and for this case we cannot do any more during ocfs2
recovery.

On 2015/11/25 9:08, Shichangkuo wrote:
> Hi,Joseph Qi
> In this situation,  ocfs2_begin_local_alloc_recovery will be called, and 
> finally call ocfs2_clear_local_alloc, which direct clear la bitmap, like:
> 
> alloc->id1.bitmap1.i_total = 0;
> alloc->id1.bitmap1.i_used = 0;
> la->la_bm_off = 0;
> for(i = 0; i < le16_to_cpu(la->la_size); i++)
> la->la_bitmap[i] = 0;
> 
> It's different from fsck.ocfs2, also called function ocfs2_clear_local_alloc, 
> but this function does more than direct clear la bitmap.
> Global bitmap will also be cleared.
> So, does this patch data-safe?
> 
> 
> -邮件原件-
> 发件人: ocfs2-devel-boun...@oss.oracle.com 
> [mailto:ocfs2-devel-boun...@oss.oracle.com] 代表 Joseph Qi
> 发送时间: 2015年11月24日 21:38
> 收件人: Andrew Morton
> 抄送: Mark Fasheh; ocfs2-devel@oss.oracle.com
> 主题: [Ocfs2-devel] [PATCH] ocfs2: fix BUG due to uncleaned localalloc during 
> mount
> 
> Tariq has reported a BUG before and posted a fix at:
> https://oss.oracle.com/pipermail/ocfs2-devel/2015-April/010696.html
> 
> This is because during umount, localalloc shutdown relies on journal 
> shutdown. But during journal shutdown, it just stops commit thread without 
> checking its result. So it may happen that localalloc shutdown uncleaned 
> during I/O error and after that, journal then has been marked clean if I/O 
> restores.
> Then during mount, localalloc won't be recovered because of clean journal and 
> then trigger BUG when claiming clusters from localalloc.
> 
> In Tariq's fix, we have to run fsck offline and a separate fix to fsck is 
> needed because it currently does not support clearing out localalloc inode. 
> And my way to fix this issue is checking localalloc before actually loading 
> it during mount. And this is somewhat online.
> 
> Signed-off-by: Joseph Qi 
> ---
>  fs/ocfs2/localalloc.c | 19 ---  fs/ocfs2/localalloc.h |  2 +-
>  fs/ocfs2/super.c  | 17 ++---
>  3 files changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index 
> 0a4457f..ceebaef 100644
> --- a/fs/ocfs2/localalloc.c
> +++ b/fs/ocfs2/localalloc.c
> @@ -281,7 +281,7 @@ bail:
> return ret;
>  }
> 
> -int ocfs2_load_local_alloc(struct ocfs2_super *osb)
> +int ocfs2_load_local_alloc(struct ocfs2_super *osb, int check, int
> +*recovery)
>  {
> int status = 0;
> struct ocfs2_dinode *alloc = NULL;
> @@ -345,21 +345,26 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb)
> if (num_used
> || alloc->id1.bitmap1.i_used
> || alloc->id1.bitmap1.i_total
> -   || la->la_bm_off)
> +   || la->la_bm_off) {
> mlog(ML_ERROR, "Local alloc hasn't been recovered!\n"
>  "found = %u, set = %u, taken = %u, off = %u\n",
>  num_used, le32_to_cpu(alloc->id1.bitmap1.i_used),
>  le32_to_cpu(alloc->id1.bitmap1.i_total),
>  OCFS2_LOCAL_ALLOC(alloc)->la_bm_off);
> +   status = -EINVAL;
> +   *recovery = 1;
> +   goto bail;
> +   }
> 
> -   osb->local_alloc_bh = alloc_bh;
> -   osb->local_alloc_state = OCFS2_LA_ENABLED;
> +   if (!check) {
> +   osb->local_alloc_bh = alloc_bh;
> +   osb->local_alloc_state = OCFS2_LA_ENABLED;
> +   }
> 
>  bail:
> -   if (status < 0)
> +   if (status < 0 || check)
> brelse(alloc_bh);
> -   if (inode)
> -   iput(inode);
> +   iput(inode);
> 
> trace_ocfs2_load_local_alloc(osb->local_alloc_bits);
> 
> diff --git a/fs/ocfs2/localalloc.h b/fs/ocfs2/localalloc.h index 
> 44a7d1f..a913841 100644
> --- a/fs/ocfs2/localalloc.h
> +++ b/fs/ocfs2/localalloc.h
> @@ -26,7 +26,7 @@
>  #ifndef OCFS2_LOCALALLOC_H
>  #define OCFS2_LOCALALLOC_H
> 
> -int ocfs2_load_local_alloc(struct ocfs2_super *osb);
> +int ocfs2_load_local_alloc(struct ocfs2_super *osb, int check, int
> +*recovery);
> 
>  void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb);
> 
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 2de4c8a..4004b29 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -2428,6 +2428,7 @@ static int ocfs2_check_volume(struct ocfs2_super *osb)
> int status;
> int dirty;
> 

Re: [Ocfs2-devel] [PATCH] ocfs2: fix BUG due to uncleaned localalloc during mount

2015-11-24 Thread Shichangkuo
Hi,Joseph Qi
In this situation,  ocfs2_begin_local_alloc_recovery will be called, and 
finally call ocfs2_clear_local_alloc, which direct clear la bitmap, like:

alloc->id1.bitmap1.i_total = 0;
alloc->id1.bitmap1.i_used = 0;
la->la_bm_off = 0;
for(i = 0; i < le16_to_cpu(la->la_size); i++)
la->la_bitmap[i] = 0;

It's different from fsck.ocfs2, also called function ocfs2_clear_local_alloc, 
but this function does more than direct clear la bitmap.
Global bitmap will also be cleared.
So, does this patch data-safe?


-邮件原件-
发件人: ocfs2-devel-boun...@oss.oracle.com 
[mailto:ocfs2-devel-boun...@oss.oracle.com] 代表 Joseph Qi
发送时间: 2015年11月24日 21:38
收件人: Andrew Morton
抄送: Mark Fasheh; ocfs2-devel@oss.oracle.com
主题: [Ocfs2-devel] [PATCH] ocfs2: fix BUG due to uncleaned localalloc during 
mount

Tariq has reported a BUG before and posted a fix at:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-April/010696.html

This is because during umount, localalloc shutdown relies on journal shutdown. 
But during journal shutdown, it just stops commit thread without checking its 
result. So it may happen that localalloc shutdown uncleaned during I/O error 
and after that, journal then has been marked clean if I/O restores.
Then during mount, localalloc won't be recovered because of clean journal and 
then trigger BUG when claiming clusters from localalloc.

In Tariq's fix, we have to run fsck offline and a separate fix to fsck is 
needed because it currently does not support clearing out localalloc inode. And 
my way to fix this issue is checking localalloc before actually loading it 
during mount. And this is somewhat online.

Signed-off-by: Joseph Qi 
---
 fs/ocfs2/localalloc.c | 19 ---  fs/ocfs2/localalloc.h |  2 +-
 fs/ocfs2/super.c  | 17 ++---
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index 
0a4457f..ceebaef 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -281,7 +281,7 @@ bail:
return ret;
 }

-int ocfs2_load_local_alloc(struct ocfs2_super *osb)
+int ocfs2_load_local_alloc(struct ocfs2_super *osb, int check, int
+*recovery)
 {
int status = 0;
struct ocfs2_dinode *alloc = NULL;
@@ -345,21 +345,26 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb)
if (num_used
|| alloc->id1.bitmap1.i_used
|| alloc->id1.bitmap1.i_total
-   || la->la_bm_off)
+   || la->la_bm_off) {
mlog(ML_ERROR, "Local alloc hasn't been recovered!\n"
 "found = %u, set = %u, taken = %u, off = %u\n",
 num_used, le32_to_cpu(alloc->id1.bitmap1.i_used),
 le32_to_cpu(alloc->id1.bitmap1.i_total),
 OCFS2_LOCAL_ALLOC(alloc)->la_bm_off);
+   status = -EINVAL;
+   *recovery = 1;
+   goto bail;
+   }

-   osb->local_alloc_bh = alloc_bh;
-   osb->local_alloc_state = OCFS2_LA_ENABLED;
+   if (!check) {
+   osb->local_alloc_bh = alloc_bh;
+   osb->local_alloc_state = OCFS2_LA_ENABLED;
+   }

 bail:
-   if (status < 0)
+   if (status < 0 || check)
brelse(alloc_bh);
-   if (inode)
-   iput(inode);
+   iput(inode);

trace_ocfs2_load_local_alloc(osb->local_alloc_bits);

diff --git a/fs/ocfs2/localalloc.h b/fs/ocfs2/localalloc.h index 
44a7d1f..a913841 100644
--- a/fs/ocfs2/localalloc.h
+++ b/fs/ocfs2/localalloc.h
@@ -26,7 +26,7 @@
 #ifndef OCFS2_LOCALALLOC_H
 #define OCFS2_LOCALALLOC_H

-int ocfs2_load_local_alloc(struct ocfs2_super *osb);
+int ocfs2_load_local_alloc(struct ocfs2_super *osb, int check, int
+*recovery);

 void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb);

diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 2de4c8a..4004b29 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -2428,6 +2428,7 @@ static int ocfs2_check_volume(struct ocfs2_super *osb)
int status;
int dirty;
int local;
+   int la_dirty = 0, recovery = 0;
struct ocfs2_dinode *local_alloc = NULL; /* only used if we
  * recover
  * ourselves. */
@@ -2449,6 +2450,16 @@ static int ocfs2_check_volume(struct ocfs2_super *osb)
 * recover anything. Otherwise, journal_load will do that
 * dirty work for us :) */
if (!dirty) {
+   /* It may happen that local alloc is unclean shutdown, but
+* journal has been marked clean, so check it here and do
+* recovery if needed */
+   status = ocfs2_load_local_alloc(osb, 1, );
+   if (recovery) {
+   printk(KERN_NOTICE "ocfs2: local alloc needs recovery "
+   "on device (%s).\n", 

Re: [Ocfs2-devel] [PATCH 1/1] ocfs2: return non-zero st_blocks for inline data

2015-11-24 Thread Gang He
Looks good for me.

Thanks
Gang


>>> 
> Some versions of tar assume that files with st_blocks == 0 do not
> contain any data and will skip reading them entirely. See also
> commit 9206c561554c ("ext4: return non-zero st_blocks for inline data").
> 
> Signed-off-by: John Haxby 
> ---
>  fs/ocfs2/file.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 0e5b451..d631279 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1302,6 +1302,14 @@ int ocfs2_getattr(struct vfsmount *mnt,
>   }
>  
>   generic_fillattr(inode, stat);
> + /*
> +  * If there is inline data in the inode, the inode will normally not
> +  * have data blocks allocated (it may have an external xattr block).
> +  * Report at least one sector for such files, so tools like tar, rsync,
> +  * others don't incorrectly think the file is completely sparse.
> +  */
> + if (unlikely(OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL))
> + stat->blocks += (stat->size + 511)>>9;
>  
>   /* We set the blksize from the cluster size for performance */
>   stat->blksize = osb->s_clustersize;
> -- 
> 2.5.0
> 
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com 
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 2/4] ocfs2: sysfile interfaces for online file check

2015-11-24 Thread Gang He
Hi Mark and Junxiao,


>>> 
> On Tue, Nov 03, 2015 at 04:20:27PM +0800, Junxiao Bi wrote:
>> Hi Gang,
>> 
>> On 11/03/2015 03:54 PM, Gang He wrote:
>> > Hi Junxiao,
>> > 
>> > Thank for your reviewing.
>> > Current design, we use a sysfile as a interface to check/fix a file (via 
> pass a ino number).
>> > But, this operation is manually triggered by user, instead of 
>> > automatically 
>  fix in the kernel.
>> > Why?
>> > 1) we should let users make this decision, since some users do not want to 
> fix when encountering a file system corruption, maybe they want to keep the 
> file system unchanged for a further investigation.
>> If user don't want this, they should not use error=continue option, let
>> fs go after a corruption is very dangerous.
> 
> Maybe we need another errors=XXX flag (maybe errors=fix)?
> 
> You both make good points, here's what I gather from the conversation:
> 
>  - Some customers would be sad if they have to manually fix corruptions.
>This takes effort on their part, and if the FS can handle it
>automatically, it should.
> 
>  - There are valid concerns that automatically fixing things is a change in
>behavior that might not be welcome, or worse might lead to unforseeable
>circumstances.
> 
>  - I will add that fixing things automatically implies checking them
>automatically which could introduce some performance impact depending on
>how much checking we're doing.
> 
> So if the user wants errors to be fixed automatically, they could mount with
> errros=fix, and everyone else would have no change in behavior unless they
> wanted to make use of the new feature.
That is what I want to say, add a mount option to let users to decide. Here, I 
want to split "error=fix"
mount option  task out from online file check feature, I think this part should 
be a independent feature.
We can implement this feature after online file check is done, I want to split 
the feature into some more 
detailed features, implement them one by one. Do you agree this point?

> 
> 
>> > 2) frankly speaking, this feature will probably bring a second corruption 
> if there is some error in the code, I do not suggest to use automatically fix 
> by default in the first version.
>> I think if this feature could bring more corruption, then this should be
>> fixed first.
> 
> Btw, I am pretty sure that Gang is referring to the feature being new and
> thus more likely to have problems. There is nothing I see in here that is
> file system corrupting.
>   --Mark
> 
> 
> --
> Mark Fasheh


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 4/4] ocfs2: check/fix inode block for online file check

2015-11-24 Thread Junxiao Bi
Hi Mark,

On 11/25/2015 06:16 AM, Mark Fasheh wrote:
> Hi Junxiao,
> 
> On Tue, Nov 03, 2015 at 03:12:35PM +0800, Junxiao Bi wrote:
>> Hi Gang,
>>
>> This is not like a right patch.
>> First, online file check only checks inode's block number, valid flag,
>> fs generation value, and meta ecc. I never see a real corruption
>> happened only on this field, if these fields are corrupted, that means
>> something bad may happen on other place. So fix this field may not help
>> and even cause corruption more hard.
> 
> I agree that these are rather uncommon, we might even consider removing the
> VALID_FL fixup. I definitely don't think we're ready for anything more
> complicated than this though either. We kind of have to start somewhere too.
> 
Yes, the fix is too simple, and just a start, I think we'd better wait
more useful parts done before merging it.
> 
>> Second, the repair way is wrong. In
>> ocfs2_filecheck_repair_inode_block(), if these fields in disk don't
>> match the ones in memory, the ones in memory are used to update the disk
>> fields. The question is how do you know these field in memory are
>> right(they may be the real corrupted ones)?
> 
> Your second point (and the last part of your 1st point) makes a good
> argument for why this shouldn't happen automatically. Some of these
> corruptions might require a human to look at the log and decide what to do.
> Especially as you point out, where we might not know where the source of the
> corruption is. And if the human can't figure it out, then it's probably time
> to unmount and fsck.
The point is that the fix way is wrong, just flush memory info to disk
is not right. I agree online fsck is good feature, but need carefully
design, it should not involve more corruptions. A rough idea from mine
is that maybe we need some "frezee" mechanism in fs, which can hung all
fs op and let fs stop at a safe area. After freeze fs, we can do some
fsck work on it and these works should not cost lots time. What's your idea?

Thanks,
Junxiao.

> 
> Thanks,
>   --Mark
> 
> --
> Mark Fasheh
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 2/4] ocfs2: sysfile interfaces for online file check

2015-11-24 Thread Junxiao Bi
On 11/25/2015 05:46 AM, Mark Fasheh wrote:
> On Tue, Nov 03, 2015 at 04:20:27PM +0800, Junxiao Bi wrote:
>> Hi Gang,
>>
>> On 11/03/2015 03:54 PM, Gang He wrote:
>>> Hi Junxiao,
>>>
>>> Thank for your reviewing.
>>> Current design, we use a sysfile as a interface to check/fix a file (via 
>>> pass a ino number).
>>> But, this operation is manually triggered by user, instead of automatically 
>>>  fix in the kernel.
>>> Why?
>>> 1) we should let users make this decision, since some users do not want to 
>>> fix when encountering a file system corruption, maybe they want to keep the 
>>> file system unchanged for a further investigation.
>> If user don't want this, they should not use error=continue option, let
>> fs go after a corruption is very dangerous.
> 
> Maybe we need another errors=XXX flag (maybe errors=fix)?
Sound great. This is a good option since user may have not enough
knowledge whether to fix the found issue.

Thanks,
Junxiao.
> 
> You both make good points, here's what I gather from the conversation:
> 
>  - Some customers would be sad if they have to manually fix corruptions.
>This takes effort on their part, and if the FS can handle it
>automatically, it should.
> 
>  - There are valid concerns that automatically fixing things is a change in
>behavior that might not be welcome, or worse might lead to unforseeable
>circumstances.
> 
>  - I will add that fixing things automatically implies checking them
>automatically which could introduce some performance impact depending on
>how much checking we're doing.
> 
> So if the user wants errors to be fixed automatically, they could mount with
> errros=fix, and everyone else would have no change in behavior unless they
> wanted to make use of the new feature.
> 
> 
>>> 2) frankly speaking, this feature will probably bring a second corruption 
>>> if there is some error in the code, I do not suggest to use automatically 
>>> fix by default in the first version.
>> I think if this feature could bring more corruption, then this should be
>> fixed first.
> 
> Btw, I am pretty sure that Gang is referring to the feature being new and
> thus more likely to have problems. There is nothing I see in here that is
> file system corrupting.
>   --Mark
> 
> 
> --
> Mark Fasheh
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 2/4] ocfs2: sysfile interfaces for online file check

2015-11-24 Thread Junxiao Bi
Hi Gang,

On 11/25/2015 11:29 AM, Gang He wrote:
> Hi Mark and Junxiao,
> 
> 

>> On Tue, Nov 03, 2015 at 04:20:27PM +0800, Junxiao Bi wrote:
>>> Hi Gang,
>>>
>>> On 11/03/2015 03:54 PM, Gang He wrote:
 Hi Junxiao,

 Thank for your reviewing.
 Current design, we use a sysfile as a interface to check/fix a file (via 
>> pass a ino number).
 But, this operation is manually triggered by user, instead of 
 automatically 
>>  fix in the kernel.
 Why?
 1) we should let users make this decision, since some users do not want to 
>> fix when encountering a file system corruption, maybe they want to keep the 
>> file system unchanged for a further investigation.
>>> If user don't want this, they should not use error=continue option, let
>>> fs go after a corruption is very dangerous.
>>
>> Maybe we need another errors=XXX flag (maybe errors=fix)?
>>
>> You both make good points, here's what I gather from the conversation:
>>
>>  - Some customers would be sad if they have to manually fix corruptions.
>>This takes effort on their part, and if the FS can handle it
>>automatically, it should.
>>
>>  - There are valid concerns that automatically fixing things is a change in
>>behavior that might not be welcome, or worse might lead to unforseeable
>>circumstances.
>>
>>  - I will add that fixing things automatically implies checking them
>>automatically which could introduce some performance impact depending on
>>how much checking we're doing.
>>
>> So if the user wants errors to be fixed automatically, they could mount with
>> errros=fix, and everyone else would have no change in behavior unless they
>> wanted to make use of the new feature.
> That is what I want to say, add a mount option to let users to decide. Here, 
> I want to split "error=fix"
> mount option  task out from online file check feature, I think this part 
> should be a independent feature.
> We can implement this feature after online file check is done, I want to 
> split the feature into some more 
> detailed features, implement them one by one. Do you agree this point?
With error=fix, when a possible corruption is found, online fsck will
start to check and fix things. So this doesn't looks like a independent
feature.

Thanks,
Junxiao.

> 
>>
>>
 2) frankly speaking, this feature will probably bring a second corruption 
>> if there is some error in the code, I do not suggest to use automatically 
>> fix 
>> by default in the first version.
>>> I think if this feature could bring more corruption, then this should be
>>> fixed first.
>>
>> Btw, I am pretty sure that Gang is referring to the feature being new and
>> thus more likely to have problems. There is nothing I see in here that is
>> file system corrupting.
>>  --Mark
>>
>>
>> --
>> Mark Fasheh
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 4/4] ocfs2: check/fix inode block for online file check

2015-11-24 Thread Gang He
Hi Mark and Junxiao,


>>> 
> Hi Mark,
> 
> On 11/25/2015 06:16 AM, Mark Fasheh wrote:
>> Hi Junxiao,
>> 
>> On Tue, Nov 03, 2015 at 03:12:35PM +0800, Junxiao Bi wrote:
>>> Hi Gang,
>>>
>>> This is not like a right patch.
>>> First, online file check only checks inode's block number, valid flag,
>>> fs generation value, and meta ecc. I never see a real corruption
>>> happened only on this field, if these fields are corrupted, that means
>>> something bad may happen on other place. So fix this field may not help
>>> and even cause corruption more hard.
>> 
>> I agree that these are rather uncommon, we might even consider removing the
>> VALID_FL fixup. I definitely don't think we're ready for anything more
>> complicated than this though either. We kind of have to start somewhere too.
>> 
> Yes, the fix is too simple, and just a start, I think we'd better wait
> more useful parts done before merging it.
I agree, just remark VALID_FL flag to fix this field is too simple, we should 
delay this field fix before 
I have a flawless solution, I will remove these lines code in the first version 
patches. In the future submits,
I also hope your guys to help review the code carefully, shout out your 
comments when you doubt somewhere.



>> 
>>> Second, the repair way is wrong. In
>>> ocfs2_filecheck_repair_inode_block(), if these fields in disk don't
>>> match the ones in memory, the ones in memory are used to update the disk
>>> fields. The question is how do you know these field in memory are
>>> right(they may be the real corrupted ones)?
>> 
>> Your second point (and the last part of your 1st point) makes a good
>> argument for why this shouldn't happen automatically. Some of these
>> corruptions might require a human to look at the log and decide what to do.
>> Especially as you point out, where we might not know where the source of the
>> corruption is. And if the human can't figure it out, then it's probably time
>> to unmount and fsck.
> The point is that the fix way is wrong, just flush memory info to disk
> is not right. I agree online fsck is good feature, but need carefully
> design, it should not involve more corruptions. A rough idea from mine
> is that maybe we need some "frezee" mechanism in fs, which can hung all
> fs op and let fs stop at a safe area. After freeze fs, we can do some
> fsck work on it and these works should not cost lots time. What's your idea?
If we need to touch some global data structures, freezing fs can be considered 
when we can't
get any way in case using the locks.
If we only handle some independent problem, we just need to lock the related 
data structures. 

> 
> Thanks,
> Junxiao.
> 
>> 
>> Thanks,
>>  --Mark
>> 
>> --
>> Mark Fasheh
>> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 2/4] ocfs2: sysfile interfaces for online file check

2015-11-24 Thread Gang He
Hi Junxiao,


>>> 
> Hi Gang,
> 
> On 11/25/2015 11:29 AM, Gang He wrote:
>> Hi Mark and Junxiao,
>> 
>> 
>
>>> On Tue, Nov 03, 2015 at 04:20:27PM +0800, Junxiao Bi wrote:
 Hi Gang,

 On 11/03/2015 03:54 PM, Gang He wrote:
> Hi Junxiao,
>
> Thank for your reviewing.
> Current design, we use a sysfile as a interface to check/fix a file (via 
>>> pass a ino number).
> But, this operation is manually triggered by user, instead of 
> automatically 
>>>  fix in the kernel.
> Why?
> 1) we should let users make this decision, since some users do not want 
> to 
>>> fix when encountering a file system corruption, maybe they want to keep the 
>>> file system unchanged for a further investigation.
 If user don't want this, they should not use error=continue option, let
 fs go after a corruption is very dangerous.
>>>
>>> Maybe we need another errors=XXX flag (maybe errors=fix)?
>>>
>>> You both make good points, here's what I gather from the conversation:
>>>
>>>  - Some customers would be sad if they have to manually fix corruptions.
>>>This takes effort on their part, and if the FS can handle it
>>>automatically, it should.
>>>
>>>  - There are valid concerns that automatically fixing things is a change in
>>>behavior that might not be welcome, or worse might lead to unforseeable
>>>circumstances.
>>>
>>>  - I will add that fixing things automatically implies checking them
>>>automatically which could introduce some performance impact depending on
>>>how much checking we're doing.
>>>
>>> So if the user wants errors to be fixed automatically, they could mount with
>>> errros=fix, and everyone else would have no change in behavior unless they
>>> wanted to make use of the new feature.
>> That is what I want to say, add a mount option to let users to decide. Here, 
> I want to split "error=fix"
>> mount option  task out from online file check feature, I think this part 
> should be a independent feature.
>> We can implement this feature after online file check is done, I want to 
> split the feature into some more 
>> detailed features, implement them one by one. Do you agree this point?
> With error=fix, when a possible corruption is found, online fsck will
> start to check and fix things. So this doesn't looks like a independent
> feature.
My means is, we can implement online file check by manually triage feature 
first, then
Add a mount option "error=fix" feature, the second feature can be implemented 
after
the first part is done. I want to split them into more detailed items, maybe it 
is more helpful
to be reviewed, but the whole feature ideas are very OK, just need to do one by 
one.  

> 
> Thanks,
> Junxiao.
> 
>> 
>>>
>>>
> 2) frankly speaking, this feature will probably bring a second corruption 
>>> if there is some error in the code, I do not suggest to use automatically 
> fix 
>>> by default in the first version.
 I think if this feature could bring more corruption, then this should be
 fixed first.
>>>
>>> Btw, I am pretty sure that Gang is referring to the feature being new and
>>> thus more likely to have problems. There is nothing I see in here that is
>>> file system corrupting.
>>> --Mark
>>>
>>>
>>> --
>>> Mark Fasheh
>> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2 4/4] ocfs2: check/fix inode block for online file check

2015-11-24 Thread Junxiao Bi
On 11/25/2015 01:04 PM, Gang He wrote:
> Hi Mark and Junxiao,
> 
> 

>> Hi Mark,
>>
>> On 11/25/2015 06:16 AM, Mark Fasheh wrote:
>>> Hi Junxiao,
>>>
>>> On Tue, Nov 03, 2015 at 03:12:35PM +0800, Junxiao Bi wrote:
 Hi Gang,

 This is not like a right patch.
 First, online file check only checks inode's block number, valid flag,
 fs generation value, and meta ecc. I never see a real corruption
 happened only on this field, if these fields are corrupted, that means
 something bad may happen on other place. So fix this field may not help
 and even cause corruption more hard.
>>>
>>> I agree that these are rather uncommon, we might even consider removing the
>>> VALID_FL fixup. I definitely don't think we're ready for anything more
>>> complicated than this though either. We kind of have to start somewhere too.
>>>
>> Yes, the fix is too simple, and just a start, I think we'd better wait
>> more useful parts done before merging it.
> I agree, just remark VALID_FL flag to fix this field is too simple, we should 
> delay this field fix before 
> I have a flawless solution, I will remove these lines code in the first 
> version patches. In the future submits,
> I also hope your guys to help review the code carefully, shout out your 
> comments when you doubt somewhere.
Sure.

> 
> 
> 
>>>
 Second, the repair way is wrong. In
 ocfs2_filecheck_repair_inode_block(), if these fields in disk don't
 match the ones in memory, the ones in memory are used to update the disk
 fields. The question is how do you know these field in memory are
 right(they may be the real corrupted ones)?
>>>
>>> Your second point (and the last part of your 1st point) makes a good
>>> argument for why this shouldn't happen automatically. Some of these
>>> corruptions might require a human to look at the log and decide what to do.
>>> Especially as you point out, where we might not know where the source of the
>>> corruption is. And if the human can't figure it out, then it's probably time
>>> to unmount and fsck.
>> The point is that the fix way is wrong, just flush memory info to disk
>> is not right. I agree online fsck is good feature, but need carefully
>> design, it should not involve more corruptions. A rough idea from mine
>> is that maybe we need some "frezee" mechanism in fs, which can hung all
>> fs op and let fs stop at a safe area. After freeze fs, we can do some
>> fsck work on it and these works should not cost lots time. What's your idea?
> If we need to touch some global data structures, freezing fs can be 
> considered when we can't
> get any way in case using the locks.
> If we only handle some independent problem, we just need to lock the related 
> data structures. 
Hmm, I am not sure whether it's hard to decide an independent issue.

Thanks,
Junxiao.
> 
>>
>> Thanks,
>> Junxiao.
>>
>>>
>>> Thanks,
>>> --Mark
>>>
>>> --
>>> Mark Fasheh
>>>
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] Long io response time doubt

2015-11-24 Thread Eric Ren

Sorry, forget to add the pieces of code flow...

On reading node:

 3)  dlm_ast-4278  =>  ocfs2dc-4277
 --

 3)   |  ocfs2_process_blocked_lock() {
 3)   |ocfs2_unblock_lock() {
 3)   0.116 us|  ocfs2_prepare_cancel_convert();
 3)   |  ocfs2_cancel_convert() {
 3)   |user_dlm_unlock() {
 3)   |  dlm_unlock() {
 3)   0.120 us|dlm_find_lockspace_local();
 3)   0.158 us|find_lkb();
 3)   |cancel_lock() {
 3)   |  validate_unlock_args() {
 3)   0.093 us|del_timeout();
 3)   0.782 us|  }
 3)   |  _cancel_lock() {
 3)   |send_common() {
 3)   0.189 us|  add_to_waiters();
 3)   |  create_message() {
 3)   |_create_message() {
 3)   |  dlm_lowcomms_get_buffer() {
 3)   0.156 us|nodeid2con();
 3)   1.680 us|  }
 3)   0.108 us|  dlm_our_nodeid();
 3)   2.821 us|}
 3)   3.319 us|  }
 3)   0.094 us|  send_args();
 3)   |  send_message() {
 3)   0.070 us|dlm_message_out();
 3)   9.485 us|dlm_lowcomms_commit_buffer();
 3) + 10.609 us   |  }
 3) + 16.054 us   |}
 3) + 16.632 us   |  }
 3)   0.156 us|  put_rsb();
 3) + 19.044 us   |}
 3)   |dlm_put_lkb() {
 3)   0.094 us|  __put_lkb();
 3)   0.632 us|}
 3)   0.074 us|dlm_put_lockspace();
 3) + 22.513 us   |  }
 3) + 23.028 us   |}
 3) + 23.727 us   |  }
 3) + 25.004 us   |}
 3)   |ocfs2_schedule_blocked_lock() {
 3)   0.073 us|  lockres_set_flags();
 3)   0.592 us|}
 3) + 26.852 us   |  }
 --
 3)  ocfs2dc-4277  =>  dlm_ast-4278
 --

 3)   |  process_asts() {
 3)   0.202 us|dlm_rem_lkb_callback();
 3)   0.081 us|dlm_rem_lkb_callback();
 3)   |fsdlm_lock_ast_wrapper() {
 3)   |  ocfs2_unlock_ast() {
 3)   0.099 us|ocfs2_get_inode_osb();
 3)   1.290 us|ocfs2_wake_downconvert_thread();
 3)   |lockres_clear_flags() {
 3)   8.539 us|  lockres_set_flags();
 3)   9.096 us|}
 3) + 12.055 us   |  }
 3) + 12.673 us   |}
 3)   |dlm_put_lkb() {
 3)   0.161 us|  __put_lkb();
 3)   0.718 us|}
 3) + 16.133 us   |  }


On writing node:

 3)  kworker-443   =>  ocfs2dc-4456
 --

 3)   |  ocfs2_process_blocked_lock() {
 3)   |ocfs2_unblock_lock() {
 3)   0.269 us|  ocfs2_prepare_cancel_convert();
 3)   |  ocfs2_cancel_convert() {
 3)   |user_dlm_unlock() {
 3)   |  dlm_unlock() {
 3)   0.321 us|dlm_find_lockspace_local();
 3)   0.286 us|find_lkb();
 3)   |cancel_lock() {
 3)   |  validate_unlock_args() {
 3)   0.122 us|del_timeout();
 3)   0.901 us|  }
 3)   |  _cancel_lock() {
 3)   |do_cancel() {
 3)   |  revert_lock() {
 3)   |move_lkb() {
 3)   0.155 us|  del_lkb();
 3)   0.243 us|  add_lkb();
 3)   1.778 us|}
 3)   2.577 us|  }
 3)   |  queue_cast() {
 3)   0.102 us|del_timeout();
 3)   |dlm_add_ast() {
 3)   0.165 us|  dlm_add_lkb_callback();
 3) + 14.492 us   |}
 3) + 16.381 us   |  }
 3) + 20.384 us   |}
 3)   |grant_pending_locks() {
 3)   |  grant_pending_convert() {
 3)   |can_be_granted() {
 3)   0.143 us|  _can_be_granted();
 3)   0.906 us|}
 3)   1.900 us|  }
 3)   2.738 us|}
 3) + 24.670 us   |  }
 3)   0.154 us|  put_rsb();
 3) + 28.068 us   |}
 3)   |dlm_put_lkb() {
 3)   0.163 us|  __put_lkb();
 3)   1.029 us|}
 3)   0.195 us|

Re: [Ocfs2-devel] Long io response time doubt

2015-11-24 Thread Eric Ren
Hi Joseph,

I use ftrace's function tracer to record some code flow. There's a 
question that makes me confused -
why does ocfs2_cancel_convert() be called here in ocfs2dc thread? In 
other words, what do we expect it
to do here?

ocfs2_unblock_lock(){
  ...
  if(lockres->l_flags & OCFS2_LOCK_BUSY){
 ...
 ocfs2_cancel_convert()
...
 }
}

 From what I understand, 
ocfs2_cancel_convert()->ocfs2_dlm_unlock()->user_dlm_unlock()->dlm_unlock(DLM_LKF_CANCEL)
 
puts
the lock back on the the grand queue at its old grant mode.  In my case, 
you know, read/write the same shared file from two nodes,
I think the up-conversion can only happen on the writing node - 
(PR->EX), while on the reading node, no up-conversion  is need, right?

But, the following output from writing and reading nodes, shows that 
ocfs2_cancel_convert() has been called on both nodes. why could
this happen in this scenario?

On 11/16/15 09:40, Joseph Qi wrote:
>> Sorry, I'm confused about b). You mean b) is also part of ocfs2cmt's
>> work? Does b) have something to do with a)? And what's the meaning of "evict 
>> inode"?
>> Actually, I can hardly understand the idea of b).
> You can go through the code flow:
> iput->iput_final->evict->evict_inode->ocfs2_evict_inode
> ->ocfs2_clear_inode->ocfs2_checkpoint_inode->ocfs2_start_checkpoint
>
> It happens that one node do not use the inode any longer (but not
> delete), and will free its related lockres.
OK, thanks~

Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH] ocfs2: fix BUG due to uncleaned localalloc during mount

2015-11-24 Thread Joseph Qi
Tariq has reported a BUG before and posted a fix at:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-April/010696.html

This is because during umount, localalloc shutdown relies on journal
shutdown. But during journal shutdown, it just stops commit thread
without checking its result. So it may happen that localalloc shutdown
uncleaned during I/O error and after that, journal then has been marked
clean if I/O restores.
Then during mount, localalloc won't be recovered because of clean
journal and then trigger BUG when claiming clusters from localalloc.

In Tariq's fix, we have to run fsck offline and a separate fix to fsck
is needed because it currently does not support clearing out localalloc
inode. And my way to fix this issue is checking localalloc before
actually loading it during mount. And this is somewhat online.

Signed-off-by: Joseph Qi 
---
 fs/ocfs2/localalloc.c | 19 ---
 fs/ocfs2/localalloc.h |  2 +-
 fs/ocfs2/super.c  | 17 ++---
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 0a4457f..ceebaef 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -281,7 +281,7 @@ bail:
return ret;
 }

-int ocfs2_load_local_alloc(struct ocfs2_super *osb)
+int ocfs2_load_local_alloc(struct ocfs2_super *osb, int check, int *recovery)
 {
int status = 0;
struct ocfs2_dinode *alloc = NULL;
@@ -345,21 +345,26 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb)
if (num_used
|| alloc->id1.bitmap1.i_used
|| alloc->id1.bitmap1.i_total
-   || la->la_bm_off)
+   || la->la_bm_off) {
mlog(ML_ERROR, "Local alloc hasn't been recovered!\n"
 "found = %u, set = %u, taken = %u, off = %u\n",
 num_used, le32_to_cpu(alloc->id1.bitmap1.i_used),
 le32_to_cpu(alloc->id1.bitmap1.i_total),
 OCFS2_LOCAL_ALLOC(alloc)->la_bm_off);
+   status = -EINVAL;
+   *recovery = 1;
+   goto bail;
+   }

-   osb->local_alloc_bh = alloc_bh;
-   osb->local_alloc_state = OCFS2_LA_ENABLED;
+   if (!check) {
+   osb->local_alloc_bh = alloc_bh;
+   osb->local_alloc_state = OCFS2_LA_ENABLED;
+   }

 bail:
-   if (status < 0)
+   if (status < 0 || check)
brelse(alloc_bh);
-   if (inode)
-   iput(inode);
+   iput(inode);

trace_ocfs2_load_local_alloc(osb->local_alloc_bits);

diff --git a/fs/ocfs2/localalloc.h b/fs/ocfs2/localalloc.h
index 44a7d1f..a913841 100644
--- a/fs/ocfs2/localalloc.h
+++ b/fs/ocfs2/localalloc.h
@@ -26,7 +26,7 @@
 #ifndef OCFS2_LOCALALLOC_H
 #define OCFS2_LOCALALLOC_H

-int ocfs2_load_local_alloc(struct ocfs2_super *osb);
+int ocfs2_load_local_alloc(struct ocfs2_super *osb, int check, int *recovery);

 void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb);

diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 2de4c8a..4004b29 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -2428,6 +2428,7 @@ static int ocfs2_check_volume(struct ocfs2_super *osb)
int status;
int dirty;
int local;
+   int la_dirty = 0, recovery = 0;
struct ocfs2_dinode *local_alloc = NULL; /* only used if we
  * recover
  * ourselves. */
@@ -2449,6 +2450,16 @@ static int ocfs2_check_volume(struct ocfs2_super *osb)
 * recover anything. Otherwise, journal_load will do that
 * dirty work for us :) */
if (!dirty) {
+   /* It may happen that local alloc is unclean shutdown, but
+* journal has been marked clean, so check it here and do
+* recovery if needed */
+   status = ocfs2_load_local_alloc(osb, 1, );
+   if (recovery) {
+   printk(KERN_NOTICE "ocfs2: local alloc needs recovery "
+   "on device (%s).\n", osb->dev_str);
+   la_dirty = 1;
+   }
+
status = ocfs2_journal_wipe(osb->journal, 0);
if (status < 0) {
mlog_errno(status);
@@ -2477,7 +2488,7 @@ static int ocfs2_check_volume(struct ocfs2_super *osb)
JBD2_FEATURE_COMPAT_CHECKSUM, 0,
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);

-   if (dirty) {
+   if (dirty || la_dirty) {
/* recover my local alloc if we didn't unmount cleanly. */
status = ocfs2_begin_local_alloc_recovery(osb,
  osb->slot_num,
@@ -2490,13 +2501,13 @@ static int ocfs2_check_volume(struct ocfs2_super *osb)
 * ourselves as mounted. */
}

-   status = ocfs2_load_local_alloc(osb);
+