On Dec 28, 2011, at 12:34 AM, David Thiel wrote:

> On Tue, Dec 27, 2011 at 11:54:20PM -0700, Scott Long wrote:
>> The first run of fsck, using the journal, gives results that I would 
>> expect.  The second run seems to imply that the fixes made on the 
>> first run didn't actually get written to disk.  This is definitely an 
>> oddity.  I see that you're using geli, maybe there's some strange 
>> side-effect there.  No idea.  Report as a bug, this is definitely 
>> undesired behavior.
> 
> Not impossible, but I was seeing similar issues on two non-geli systems 
> as well, i.e. tons of errors fixed when doing a single-user 
> non-journalled fsck, but journalled fsck not fixing stuff. I'll try to 
> replicate on a test machine, as I already lost data on the last 
> (non-geli) machine this happened to.
> 
>> For the love that is all good and holy, don't ever run fsck on a live 
>> filesystem.  It's going to report these kinds of problems!  It's 
>> normal; filesystem metadata updates stay cached in memory, and fsck 
>> bypasses that cache.  
> 
> Ok. I expected fsck would be softupdate-aware in that way, but I 
> understand it not doing so.
> 
>>> - SU+J and fsck do not work correctly together to fix corruption on 
>>> boot, i.e. bgfsck isn't getting run when it should
>> 
>> The point of SUJ is to eliminate the need for bgfsck.  Effectively, 
>> they are exclusive ideas.  
> 
> This is surprising to me. It is my impression that under Linux at least, 
> ext3fs is checked against the journal, and gets a full e2fsck if it 
> finds it's still dirty. Additionally, there's a periodic fsck after 180 
> days continuous runtime or x number of mounts (see tune2fs -i and -c).  
> Is SU+J somehow implemented in such a way that this is unnecessary? What 
> does it do that the ext3fs people have missed?
> 

SUJ isn't like ext3 journaling, it doesn't do 100% metadata logging.  Instead, 
it's an extension of softupdates.  Softupdates (SU) is still responsible for 
ordering dependent writes to the disk to maintain consistency.  What SU can't 
handle is the Unix/POSIX idiom of unlinking a file from the namespace but 
keeping its inode active through refcounts.  When you have an unclean shutdown, 
you wind up with stale blocks allocated to orphaned inodes.  The point of 
bgfsck was to scan the filesystem for these allocations and free them, just 
like fsck does, but to do it in the background so that the boot could continue. 
 SUJ is basically just an intent log for this case; it tells fsck where to find 
these allocations so that fsck doesn't have to do the lengthy scan.  FWIW, this 
problem is present in most any journaling implementation and is usually solved 
via the use of intent records in a journal, not unlike SUJ.

So, there's an assumption with SUJ+fsck that SU is keeping the filesystem 
consistent.  Maybe that's a bad assumption, and I'm not trying to discredit 
your report.  But the intention with SUJ is to eliminate the need for anything 
more than a cursory check of the superblocks and a processing of the SUJ intent 
log.  If either of these fails then fsck reverts to a traditional scan.  In the 
same vein, ext3 and most other traditional journaling filesystems assume that 
the journal is correct and is preserving consistency, and don't do anything 
more than a cursory data structure scan and journal replay as well, but then 
revert to a full scan if that fails (zfs seems to be an exception here, with 
there being no actual fsck available for it).

As for the 180 day forced scan on ext3, I have no public comment.  SU has 
matured nicely over the last 10+ years, and I'm happy with the progress that 
SUJ has made in the last 2-3 years.  If there are bugs, they need to be exposed 
and addressed ASAP.

Scott

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to