[zfs-discuss] osol monitoring question

2010-05-10 Thread Roy Sigurd Karlsbakk
Hi all

It seems that if using zfs, the usual tools like vmstat, sar, top etc are quite 
worthless, since zfs i/o load is not reported as iowait etc. Are there any 
plans to rewrite the old performance monitoring tools or the zfs parts to allow 
for standard monitoring tools? If not, what other tools exist that can do the 
same?

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] osol monitoring question

2010-05-10 Thread Michael Schuster

On 10.05.10 08:57, Roy Sigurd Karlsbakk wrote:

Hi all

It seems that if using zfs, the usual tools like vmstat, sar, top etc are quite 
worthless, since zfs i/o load is not reported as iowait etc. Are there any 
plans to rewrite the old performance monitoring tools or the zfs parts to allow 
for standard monitoring tools? If not, what other tools exist that can do the 
same?


zpool iostat for one.

Michael
--
michael.schus...@oracle.com http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] osol monitoring question

2010-05-10 Thread Roy Sigurd Karlsbakk
- Michael Schuster michael.schus...@oracle.com skrev:

 On 10.05.10 08:57, Roy Sigurd Karlsbakk wrote:
  Hi all
 
  It seems that if using zfs, the usual tools like vmstat, sar, top
 etc are quite worthless, since zfs i/o load is not reported as iowait
 etc. Are there any plans to rewrite the old performance monitoring
 tools or the zfs parts to allow for standard monitoring tools? If not,
 what other tools exist that can do the same?
 
 zpool iostat for one.

I know that, and iostat, etc, but wouldn't it be rather consistent to integrate 
with the tools that have been used the latest two or three decades? wio 
shouldn't be reported as 0% when the disks are the bottleneck...

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore

2010-05-10 Thread Euan Thoms
erik.ableson said: Just a quick comment for the send/recv operations, adding 
-R makes it recursive so you only need one line to send the rpool and all 
descendant filesystems. 

Yes, I know of the -R flag, but it doesn't seem to work with sending loose 
snapshots to the backup pool. It obviously works when piped to a file. Sorry I 
can't remember what the error message was when I tried to 'send -R | receive 
backup-pool/rpool', it does work if done individually though.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup stats per file system

2010-05-10 Thread Darren J Moffat

On 08/05/2010 21:45, P-O Yliniemi wrote:

I have noticed that dedup is discussed a lot in this list right now..

Starting to experiment with dedup=on, I feel it would be interesting in
knowing exactly how efficient dedup is. The problem is that I've found
no way of checking this per file system. I have turned dedup on for a
few file systems to try it out:


You can't because dedup is per pool not per filesystem.  Each file 
system gets to choose if it is opting in to the pool wide dedup.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Hard disk buffer at 100%

2010-05-10 Thread Emily Grettel

Hi Eric,

 

 Problem is the OP is mixing client 4k drives with 512b drives. 

 

How do you come to that assesment?

 

Here's what I have:


Ap_Id  Information
sata1/1::dsk/c7t1d0Mod: WDC WD10EADS-00L5B1 FRev: 01.01A01

sata1/2::dsk/c7t2d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01

sata1/3::dsk/c7t3d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01

sata1/4::dsk/c7t4d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01

sata1/5::dsk/c7t5d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01

sata2/1::dsk/c0t1d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01

 

They all seem to indicate the older 512b from the WDC site unless I'm not 
understanding their spec sheets.

 

 I doubt they're broken per say, they're just dramatically slower
 than their peers in this workload.

 

It does make sense though! My read speed (trying to copy 683Gb across to 
another machine) is roughly 7-8Mbps where I used to get on average 30-40Mbps.

 

 As a replacement recommendation, we've been beating on the WD 1TB RE3

 

Cool, either the RE3 or black drives it is :-)

 

Thanks,

Em
 
  
_
View photos of singles in your area! Looking for a hot date?
http://clk.atdmt.com/NMN/go/150855801/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirroring USB Drive with Laptop for Backup purposes

2010-05-10 Thread Matt Keenan

On 05/ 7/10 10:07 PM, Bill McGonigle wrote:

On 05/07/2010 11:08 AM, Edward Ned Harvey wrote:
I'm going to continue encouraging you to staying mainstream, 
because what

people do the most is usually what's supported the best.


If I may be the contrarian, I hope Matt keeps experimenting with this, 
files bugs, and they get fixed.  His use case is very compelling - I 
know lots of SOHO folks who could really use a NAS where this 'just 
worked'


The ZFS team has done well by thinking liberally about conventional 
assumptions.


-Bill



My plan indeed is to continue with this setup (going to upgrade to 138 
to resolve my reboot issue). This particular use case for me is 
definitely compelling, the simply fact that I can plug my USB drive into 
another laptop and boot into the exact same environment is reason enough 
for me to continue with this setup and see how things go.


Mind you doing occasional zfs send's to another backup drive might be 
something I'll do aswell :-)


cheers

Matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation

2010-05-10 Thread charles
Hi,

This thread refers to Solaris 10, but it was suggested that I post it here as 
ZFS developers may well be more likely to respond.

http://forums.sun.com/thread.jspa?threadID=5438393messageID=10986502#10986502

Basically after about ZFS 1000 filesystem creations the creation time slows 
down to around 4 seconds, and gets progressively worse.

This is not the case for normal mkdir which creates thousands of directories 
very quickly.

I wanted users home directories (60,000 of them) all to be individual ZFS file 
systems, but there seems to be a bug/limitation due to the prohibitive creation 
time.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Daily snapshots as replacement for incremental backups

2010-05-10 Thread Gabriele Bulfon
Hello,
I have a situation where a zfs file server holding lots of graphic files cannot 
be backed up daily with a full backup.
My idea was initially to run a full backup on Sunday through the lto library on 
more dedicated tapes, then have an incremental backup run on daily tapes.
Brainstorming on this, led me to the idea that I could actually stop thinking 
about incremental backups (that may always lead me to unsafe backups anyway for 
some unlucky reason) and substitute the idea with daily snapshots.
Actually, the full disaster ricovery is on the Sunday full backups (that can be 
safely taken away on Monday), while the daily solution would be just a safe 
place for daily errors by users (people who delete files by mistake, for 
example).
This can be done simply running a snapshot per day during the night.
My idea is to have cron to rotate snapshots during working days, so that I 
always have Mon,Tue,Wen,Thu,Fri,Sat snapshots, and have the cron shell delete 
the oldest (actually, if I have to run a Mon snapshot, I will delete the old 
Mon snapshots, this should run the cycle).
My questions are:
- is this a good and common solution?
- is there any zfs performance degradation caused by creating and deleting 
snapshots on a daily basis, maybe fragmenting the file system?

Thanx for any suggestion
Gabriele.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Daily snapshots as replacement for incremental backups

2010-05-10 Thread Erik Trimble

Gabriele Bulfon wrote:

Hello,
I have a situation where a zfs file server holding lots of graphic files cannot 
be backed up daily with a full backup.
My idea was initially to run a full backup on Sunday through the lto library on 
more dedicated tapes, then have an incremental backup run on daily tapes.
Brainstorming on this, led me to the idea that I could actually stop thinking 
about incremental backups (that may always lead me to unsafe backups anyway for 
some unlucky reason) and substitute the idea with daily snapshots.
Actually, the full disaster ricovery is on the Sunday full backups (that can be 
safely taken away on Monday), while the daily solution would be just a safe 
place for daily errors by users (people who delete files by mistake, for 
example).
This can be done simply running a snapshot per day during the night.
My idea is to have cron to rotate snapshots during working days, so that I 
always have Mon,Tue,Wen,Thu,Fri,Sat snapshots, and have the cron shell delete 
the oldest (actually, if I have to run a Mon snapshot, I will delete the old 
Mon snapshots, this should run the cycle).
My questions are:
- is this a good and common solution?
  
Yes, though of course you realize that snapshots are not a 
disaster-recovery mechanism.  They're not really backups, either, in the 
sense that they provide no security against larger-scale failures.



- is there any zfs performance degradation caused by creating and deleting 
snapshots on a daily basis, maybe fragmenting the file system?
  
No.  Well, that's not strictly true, but you won't run into any issues 
with snapshots until you have a very large number of them 
simultaneously. 1000s, or more.  Snapshots don't fragment the file 
system any more than deleting files does.  Taking snapshots is 
instantaneous, while deleting a snapshot can vary in time from virtually 
instantaneous to taking several hours (or more), if you have dedup 
turned on and have a large amount of data (and, don't have sufficient 
L2ARC or RAM to hold the dedup table).  In the latter case, it will 
impact performance, as the entire pool has to be scanned to allow for 
proper deletion of the deduped snapshot (i.e. it has to scan the entire 
pool to figure out which data is deduped, and what can be safely deleted).





Thanx for any suggestion
Gabriele.
  



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation

2010-05-10 Thread Tomas Ögren
On 10 May, 2010 - charles sent me these 0,8K bytes:

 Hi,
 
 This thread refers to Solaris 10, but it was suggested that I post it here as 
 ZFS developers may well be more likely to respond.
 
 http://forums.sun.com/thread.jspa?threadID=5438393messageID=10986502#10986502
 
 Basically after about ZFS 1000 filesystem creations the creation time slows 
 down to around 4 seconds, and gets progressively worse.
 
 This is not the case for normal mkdir which creates thousands of directories 
 very quickly.
 
 I wanted users home directories (60,000 of them) all to be individual ZFS 
 file systems, but there seems to be a bug/limitation due to the prohibitive 
 creation time.

If you're going to share them over nfs, you'll be looking at even worse
times.

In my experience, you don't want to go over 1-2k filesystems due to
various scalability problems, esp if you're doing NFS as well. It will
be slow to create and slow when (re)booting, but other than that it
might be ok..

Look into the zfs userquota/groupquota instead.. That's what I did, and
it's partly because of these issues that the userquota/groupquota got
implemented I guess.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup stats per file system

2010-05-10 Thread P-O Yliniemi

Darren J Moffat skrev 2010-05-10 10:58:

On 08/05/2010 21:45, P-O Yliniemi wrote:

I have noticed that dedup is discussed a lot in this list right now..

Starting to experiment with dedup=on, I feel it would be interesting in
knowing exactly how efficient dedup is. The problem is that I've found
no way of checking this per file system. I have turned dedup on for a
few file systems to try it out:


You can't because dedup is per pool not per filesystem.  Each file 
system gets to choose if it is opting in to the pool wide dedup.


So dedup is operating on the pool level rather than the file system 
level, so if I have two file systems with dedup=on, they share the 
blocks and checksums pool wide ?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation

2010-05-10 Thread Roy Sigurd Karlsbakk
- charles ce...@cam.ac.uk skrev:

 Hi,
 
 This thread refers to Solaris 10, but it was suggested that I post it
 here as ZFS developers may well be more likely to respond.
 
 http://forums.sun.com/thread.jspa?threadID=5438393messageID=10986502#10986502
 
 Basically after about ZFS 1000 filesystem creations the creation time
 slows down to around 4 seconds, and gets progressively worse.
 
 This is not the case for normal mkdir which creates thousands of
 directories very quickly.
 
 I wanted users home directories (60,000 of them) all to be individual
 ZFS file systems, but there seems to be a bug/limitation due to the
 prohibitive creation time.

Is there a chance of you running out of memory here? If ZFS runs out of memory, 
it'll read indicies from disk instead of keeping them in memory, something that 
can almost kill a system.

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation

2010-05-10 Thread Roy Sigurd Karlsbakk

- Roy Sigurd Karlsbakk r...@karlsbakk.net skrev:

 - charles ce...@cam.ac.uk skrev:
 
  Hi,
  
  This thread refers to Solaris 10, but it was suggested that I post
 it
  here as ZFS developers may well be more likely to respond.
  
 
 http://forums.sun.com/thread.jspa?threadID=5438393messageID=10986502#10986502
  
  Basically after about ZFS 1000 filesystem creations the creation
 time
  slows down to around 4 seconds, and gets progressively worse.
  
  This is not the case for normal mkdir which creates thousands of
  directories very quickly.
  
  I wanted users home directories (60,000 of them) all to be
 individual
  ZFS file systems, but there seems to be a bug/limitation due to the
  prohibitive creation time.
 
 Is there a chance of you running out of memory here? If ZFS runs out
 of memory, it'll read indicies from disk instead of keeping them in
 memory, something that can almost kill a system.

Try to monitor the disk utilisation with iostat -xd 2 or something and compare 
the numbers with low/high dataset count. If disk usage increases, it's likely 
you're down on RAM. Adding more RAM or L2ARC might help.

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Plugging in a hard drive after Solaris has booted up?

2010-05-10 Thread Joerg Schilling
Ian Collins i...@ianshome.com wrote:

 Run |cfgadm -cconfigure |on the  unconfigured Ids|, see the man page for 
 the gory details.|

IF the  BIOS is OK ;-)

I have a problem with a DELL PC: If I disable the other SATA ports, Solaris
is unable to detect new drives (linux does). If I enable other SATA ports,
the DELL BIOS will stop and ask me whether I like to continue, so this is 
not an option that would survive a remote system crash/reboot.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] osol monitoring question

2010-05-10 Thread Richard Elling
On May 10, 2010, at 12:16 AM, Roy Sigurd Karlsbakk wrote:

 - Michael Schuster michael.schus...@oracle.com skrev:
 
 On 10.05.10 08:57, Roy Sigurd Karlsbakk wrote:
 Hi all
 
 It seems that if using zfs, the usual tools like vmstat, sar, top
 etc are quite worthless, since zfs i/o load is not reported as iowait
 etc. Are there any plans to rewrite the old performance monitoring
 tools or the zfs parts to allow for standard monitoring tools? If not,
 what other tools exist that can do the same?
 
 zpool iostat for one.

The traditional tools are quite useful. But you have to know how to use
them properly.  The tools I use most often are: iostat, fsstat, nfsstat, 
iosnoop, and nicstat.

 
 I know that, and iostat, etc, but wouldn't it be rather consistent to 
 integrate with the tools that have been used the latest two or three decades? 
 wio shouldn't be reported as 0% when the disks are the bottleneck...

Absolutely not. Wait for I/O is a processor state and has no direct relation
to I/O bottlenecks.  As a result, it caused confusion for the better part
of the past 30 years. In Solaris 10, wio is always zero. Alan talks about
this and refers to an Infodoc describing how wio is useless.
http://blogs.sun.com/tpenta/entry/how_solaris_calculates_user_system
However, in the brave new world, I can't find a reference to the infodoc.
Perhaps someone with a SunSolve account can find it?

Suffice to say, this still trips people up and you'll find many references to
posts where people try to clarify this if you google a bit.
 -- richard

-- 
ZFS storage and performance consulting at http://www.RichardElling.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] osol monitoring question

2010-05-10 Thread Peter Tribble
On Mon, May 10, 2010 at 7:57 AM, Roy Sigurd Karlsbakk r...@karlsbakk.net 
wrote:
 Hi all

 It seems that if using zfs, the usual tools like vmstat, sar, top etc are 
 quite worthless, since zfs i/o load is not reported as iowait etc. Are there 
 any plans to rewrite the old performance monitoring tools or the zfs parts to 
 allow for standard monitoring tools? If not, what other tools exist that can 
 do the same?

That's nothing to do with ZFS. Solaris 10 defines iowait to be exactly
zero. Which
it is, being essentially meaningless.

Things like vmstat and sar are a bit old anyway; I'm playing with replacements
for sar. Top is still pretty useful.

For zfs, zpool iostat has some utility, but I find fsstat to be pretty useful.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs performance issue

2010-05-10 Thread Abhishek Gupta

Hi,

I just installed OpenSolaris on my Dell Optiplex 755 and created raidz2 
with a few slices on a single disk. I was expecting a good read/write 
performance but I got the speed of 12-15MBps.

How can I enhance the read/write performance of my raid?
Thanks,
Abhi.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance issue

2010-05-10 Thread Erik Trimble

Abhishek Gupta wrote:

Hi,

I just installed OpenSolaris on my Dell Optiplex 755 and created 
raidz2 with a few slices on a single disk. I was expecting a good 
read/write performance but I got the speed of 12-15MBps.

How can I enhance the read/write performance of my raid?
Thanks,
Abhi.


You absolutely DON'T want to do what you've done.  Creating a ZFS pool 
(or, for that matter, any RAID device,whether hardware or software) out 
of slices/partitions of a single disk is a recipe for horrible performance.


In essence, you reduce your performance to 1/N (or worse) of the whole 
disk, where N is the number of slices you created.


So, create your zpool using disks or partitions from different disks.  
It's OK to have more than one partition on a disk - just use them in 
different pools for reasonable performance.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size

2010-05-10 Thread Brandon High
On Sun, May 9, 2010 at 9:42 PM, Geoff Nordli geo...@gnaa.net wrote:
 I am looking at using 8K block size on the zfs volume.

8k is the default for zvols.

 I was looking at the comstar iscsi settings and there is also a blk size
 configuration, which defaults as 512 bytes. That would make me believe that
 all of the IO will be broken down into 512 bytes which seems very
 inefficient.

I haven't done any tuning on my comstar volumes, and they're using 8k
blocks. The setting is in the dataset's volblocksize parameter.

 It seems this value should match the file system allocation/cluster size in
 the VM, maybe 4K if you are using an ntfs file system.

You'll have more overhead using smaller volblocksize values, and get
worse compression (since compression is done on the block). If you
have dedup enabled, you'll create more entries in the DDT which can
have pretty disastrous consequences on write performance.

Ensuring that your VM is block-aligned to 4k (or the guest OS's block
size) boundaries will help performance and dedup as well.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size

2010-05-10 Thread Roy Sigurd Karlsbakk
- Brandon High bh...@freaks.com skrev:

 On Sun, May 9, 2010 at 9:42 PM, Geoff Nordli geo...@gnaa.net wrote:
  I am looking at using 8K block size on the zfs volume.
 
 8k is the default for zvols.

So with a 1TB zbol with default blocksize, dedup is done on 8k blocks? If so, 
some 32 gigs of memory (or l2arc) will be required per terabyte for the DDT, 
which is quite a lot...

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How can I be sure the zfs send | zfs received is correct?

2010-05-10 Thread Brandon High
On Sun, May 9, 2010 at 11:16 AM, Jim Horng jho...@stretchinc.com wrote:
 zfs send tank/export/projects/project1...@today | zfs receive -d mpool

This won't get any snapshots before @today, which may lead to the
received size being smaller.

I've also noticed that different pool types (eg: raidz vs. mirror) can
lead slight differences in space usage.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored Servers

2010-05-10 Thread Maurice Volaski

It sounds like you are looking for AVS.


Consider a replication scenario where A is primary and B, secondary 
and A fails. Say you get A up again on Monday AM, but you are unable 
to summarily shut down B to bring A back online until Friday evening. 
During that whole time, you will not have a current mirror because 
AVS copies only in one direction from A to B.



If you can, then things are much easier and less complex.  I'd
personally use ZFS Snapshots to keep the two servers in sync every 60
seconds.


I've never tested this myself, but if you are depending on the server 
to perform NFS, it has been said here, 
http://opensolaris.org/jive/thread.jspa?messageID=174846#174846, 
that this will fail because the secondary's filesystem will have a 
different FSID, which the NFS client won't recognize.

--

Maurice Volaski, maurice.vola...@einstein.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How can I be sure the zfs send | zfs received is correct?

2010-05-10 Thread Roy Sigurd Karlsbakk
- Jim Horng jho...@stretchinc.com skrev:
 zfs send tank/export/projects/project1...@today | zfs receive -d
 mpool

Perhaps zfs send -R is what you're looking for...

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirroring USB Drive with Laptop for Backup purposes

2010-05-10 Thread Miles Nordin
 bh == Brandon High bh...@freaks.com writes:

bh The drive should be on the same USB port because the device
bh path is saved in the zpool.cache. If you removed the
bh zpool.cache, it wouldn't matter where the drive was plugged
bh in.

I thought it was supposed to go by devid.

There was a bug a while ago that Solaris won't calculate devid for
devices that say over SCSI they are ``removeable'' because, in the
sense that a DynaMO or DVD-R is ``removeable'', the serial number
returned by various identity commands or mode pages isn't bound to any
set of stored bits, and the way devid's are used throughout Solaris
means they are like a namespace or an array-of for a set of bit-stores
so it's not appropriate for a DVD-R drive to have a devid.  A DVD disc
could have one, though---in fact a release of a pressed disc could
appropriately have a non-serialized devid.  However USB stick
designers used to working with Microsoft don't bother to think through
how the SCSI architecture should work in a sane world because they are
used to reading chatty-idiot Microsoft manuals, so they fill out the
page like a beaurocratic form with whatever feels appropriate and mark
USB sticks ``removeable'', which according to the standard and to a
sane implementer is a warning that the virtual SCSI disk attached to
the virtual SCSI host adapter inside the USB pod might be soldered to
removeable FLASH chips.  It's quite stupid because before the OS has
even determined what kind of USB device is plugged in, it already
knows the device is removeable in that sense, just like it knows
hot-swap SATA is removeable.  USB is no more removeable, even in
practical use, than SATA.  (eSATA!  *slap*) Even in the case of CF
readers, it's probably wrong most of the time to set the removeable
SCSI flag because the connection that's severable is between the
virtual SCSI adapter in the ``reader'' and the virtual SCSI disk in
the CF/SD/... card, while the removeable flag indicates severability
between SCSI disk and storage medium.  In the CF/SD/... reader case
the serial number in the IDENTIFY command or mode pages will come from
CF/SD/... and remain bound to the bits.  The only case that might call
for setting the bit is where the adapter is synthesizing a fake mode
page where the removeable bit appears, but even then the bit should be
clear so long as any serialized fields in other commands and mode
pages are still serialized somehow (whether synthesized or not).
Actual removeable in-the-scsi-standard's-sense HARD DISK drives mostly
don't exist, and real removeable things in the real world attach as
optical where an understanding of their removeability is embedded in
the driver: ANYTHING the cd driver attaches will be treated
removeable.

consequently the bit is useless to the way solaris is using it, and
does little more than break USB support in ways like this, but the
developers refuse to let go of their dreams about what the bit was
supposed to mean even though a flood of reality has guaranteed at this
point their dream will never come true.  I think there was some
magical simon-sez flag they added to /kernel/drv/whatever.conf so the
bug could be closed, so you might go hunting for that flag in which
they will surely want you to encode in a baroque case-sensitive
undocumented notation that ``The Microtraveler model 477217045 serial
80502813 attached to driver/hub/hub/port/function has a LYING
REMOVEABLE FLAG'', but maybe you can somehow set it to '*' and rejoin
reality.  Still this won't help you on livecd's.  It's probably wiser
to walk away from USB unless/until there's a serious will to adopt the
practical mindset needed to support it reasonably.


pgpAoBbGUMwdU.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is it safe to disable the swap partition?

2010-05-10 Thread Richard Elling
On May 10, 2010, at 9:06 AM, Bob Friesenhahn wrote:

 On Mon, 10 May 2010, Thomas Tornblom wrote:
 
 Sorry, but this is incorrect.
 
 Solaris (2 if you will) does indeed swap processes in case normal paging is 
 deemed insufficient.
 
 See the chapters on Soft and Hard swapping in:
 
 http://books.google.com/books?id=r_cecYD4AKkCpg=PA189lpg=PA189dq=solaris+internals+swappingsource=blots=oBvgg3yAFZsig=lmXYtTLFWJr2JjueQVxsEylnls0hl=svei=JbXnS7nKF5L60wTtq9nTBgsa=Xoi=book_resultct=resultresnum=4ved=0CCoQ6AEwAw#v=onepageqf=false
 
 If this book is correct, then I must be wrong.  I certainly would not want to 
 use a system which is in this dire condition.

It is correct (and recommended reading :-).
I find this knowledge useful for troubleshooting.  If you stumble across a 
stumbling system and notice that the vmstat w column is not zero, then you
know that at some time in the past the system has experienced a severe
memory shortfall.
 -- richard

-- 
ZFS storage and performance consulting at http://www.RichardElling.com










___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is it safe to disable the swap partition?

2010-05-10 Thread Miles Nordin
 mg == Mike Gerdts mger...@gmail.com writes:

mg If Solaris is under memory pressure, [...]

mg The best thing to do with processes that can be swapped out
mg forever is to not run them.

Many programs allocate memory they never use.  Linux allows
overcommitting by default (but disableable), but Solaris doesn't and
can't, so on a Solaris system without swap those allocations turn into
physical RAM that can never be used.  At the time the never-to-be-used
pages are allocated, ARC must be dumped to make room for them.  With
swap, pages that are allocated but never written can be backed by
swap, and the ARC doesn't need to be dumped until the pages are
actually written.  

Note that, in this hypothetical story, swap is never written at all,
but it still has to be there.

If you run a java vm on your ``storage server'', then you might care
about this.

I think the no-swap dogma is very soothing and yet very obviously
wrong.  If you want to get into the overcommit game, fine.  If you
want to play a game where you will overcommit up to the size of the
ARC, well, ``meh'', but fine.  Until then, though, swap makes sense.


pgpA7wEb34DwB.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How can I be sure the zfs send | zfs received is correct?

2010-05-10 Thread Jim Horng
I was expecting 
zfs send tank/export/projects/project1...@today
would send everything up to @today.  That is the only snapshot and I am not 
using the -i options.
The things worries me is that tank/export/projects/project1_nb was the first 
file system that I tested with full dedup and compression.  and the first 
~300GB usage (before I merged the other file systems) showing ~2.5x dedup 
ratio.  so the data should be easily more than 600 GB.  My initial worry was 
the migration pool won't even have enough space to receive the file system when 
I started but the turn out to be very unexpected result.  My question is where 
is the dedupped data went if the new pool is showing 1.0x dedup ratio and the 
old pool is show a 2.53 ratio yet both take up about the same size ~400GB.

Is the -R option required for what I am trying to do?  what I am try to do is 
to un-dedup the file system.  
I actually preferred if non of the properties was replicated.  This is quite 
confusing and I won't be a surprise if other people are taking an incomplete 
backup with zfs send if that's the case.  

I will redo the send again with -R and see what happens.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hanging

2010-05-10 Thread Eduardo Bragatto

Hi again,

As for the NFS issue I mentioned before, I made sure the NFS server  
was working and was able to export before I attempted to import  
anything, then I started a new zpool import backup: -- my hope was  
that the NFS share was causing the issue, since the only filesystem  
shared is the one causing the problem, but that doesn't seem to be the  
case.


I've done a lot of research and could not find a similar case to mine.  
The most similar one I've found was this from 2008:


http://opensolaris.org/jive/thread.jspa?threadID=70205tstart=15

I simply can not import the pool although ZFS reports it as OK.

In that old thread, the user was also having the zpool import hang  
issue, however he was able to run these two commands (his pool was  
named data1):


zdb -e -bb data1
zdb -e - data1

While my system returns:

# zdb -e -bb backup
zdb: can't open backup: File exists
# zdb -e -ddd backup
zdb: can't open backup: File exists

Every documentation assumes you will be able to run zpool import  
before troubleshooting, however my problem is exactly on that command.  
I don't even know where to find more detailed documentation.


I believe there's very knowledgeable people in this list. Could  
someone be kind enough to take a look and at least point me in the  
right direction?


Thanks,
Eduardo Bragatto.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hanging

2010-05-10 Thread John Balestrini
Howdy Eduardo,

Recently I had a similar issue where the pool wouldn't import and attempting to 
import it would essentially lock the server up. Finally I used pfexec zpool 
import -F pool1 and simply let it do it's thing. After almost 60 hours the 
imported finished and all has been well since (except my backup procedures have 
improved!).

Good luck!

John




On May 10, 2010, at 12:35 PM, Eduardo Bragatto wrote:

 Hi again,
 
 As for the NFS issue I mentioned before, I made sure the NFS server was 
 working and was able to export before I attempted to import anything, then I 
 started a new zpool import backup: -- my hope was that the NFS share was 
 causing the issue, since the only filesystem shared is the one causing the 
 problem, but that doesn't seem to be the case.
 
 I've done a lot of research and could not find a similar case to mine. The 
 most similar one I've found was this from 2008:
 
 http://opensolaris.org/jive/thread.jspa?threadID=70205tstart=15
 
 I simply can not import the pool although ZFS reports it as OK.
 
 In that old thread, the user was also having the zpool import hang issue, 
 however he was able to run these two commands (his pool was named data1):
 
 zdb -e -bb data1
 zdb -e - data1
 
 While my system returns:
 
 # zdb -e -bb backup
 zdb: can't open backup: File exists
 # zdb -e -ddd backup
 zdb: can't open backup: File exists
 
 Every documentation assumes you will be able to run zpool import before 
 troubleshooting, however my problem is exactly on that command. I don't even 
 know where to find more detailed documentation.
 
 I believe there's very knowledgeable people in this list. Could someone be 
 kind enough to take a look and at least point me in the right direction?
 
 Thanks,
 Eduardo Bragatto.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Time Slider in Solaris10

2010-05-10 Thread Mary Ellen Fitzpatrick

Is Time Slider available in Solaris10?  Or just in Opensolaris?
I am running Solaris 10 5/09 s10x_u7wos_08 X86 and wanted to automate my 
snapshots. 
From reading blogs, seems zfs-auto-snapshot is obsolete and was/is 
being replaced by time-slider.  But I can not seem to find it for 
Solaris10. 

I do have a script/cron job that will work, but wanted to test out Time 
Slider


--
Thanks
Mary Ellen




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Time Slider in Solaris10

2010-05-10 Thread John Balestrini
I believe that Time Slider is just a front end for zfs-auto-snapshot.

John


On May 10, 2010, at 1:17 PM, Mary Ellen Fitzpatrick wrote:

 Is Time Slider available in Solaris10?  Or just in Opensolaris?
 I am running Solaris 10 5/09 s10x_u7wos_08 X86 and wanted to automate my 
 snapshots. From reading blogs, seems zfs-auto-snapshot is obsolete and was/is 
 being replaced by time-slider.  But I can not seem to find it for Solaris10. 
 I do have a script/cron job that will work, but wanted to test out Time Slider
 
 -- 
 Thanks
 Mary Ellen
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hanging

2010-05-10 Thread Eduardo Bragatto

On May 10, 2010, at 4:46 PM, John Balestrini wrote:

Recently I had a similar issue where the pool wouldn't import and  
attempting to import it would essentially lock the server up.  
Finally I used pfexec zpool import -F pool1 and simply let it do  
it's thing. After almost 60 hours the imported finished and all has  
been well since (except my backup procedures have improved!).


Hey John,

thanks a lot for answering -- I already allowed the zpool import  
command to run from Friday to Monday and it did not complete -- I also  
made sure to start it using truss and literally nothing has happened  
during that time (the truss output file does not have anything new).


While the zpool import command runs, I don't see any CPU or Disk I/O  
usage. zpool iostat shows very little I/O too:


# zpool iostat -v
 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
backup31.4T  19.1T 11  2  29.5K  11.8K
  raidz1  11.9T   741G  2  0  3.74K  3.35K
c3t102d0  -  -  0  0  23.8K  1.99K
c3t103d0  -  -  0  0  23.5K  1.99K
c3t104d0  -  -  0  0  23.0K  1.99K
c3t105d0  -  -  0  0  21.3K  1.99K
c3t106d0  -  -  0  0  21.5K  1.98K
c3t107d0  -  -  0  0  24.2K  1.98K
c3t108d0  -  -  0  0  23.1K  1.98K
  raidz1  12.2T   454G  3  0  6.89K  3.94K
c3t109d0  -  -  0  0  43.7K  2.09K
c3t110d0  -  -  0  0  42.9K  2.11K
c3t111d0  -  -  0  0  43.9K  2.11K
c3t112d0  -  -  0  0  43.8K  2.09K
c3t113d0  -  -  0  0  47.0K  2.08K
c3t114d0  -  -  0  0  42.9K  2.08K
c3t115d0  -  -  0  0  44.1K  2.08K
  raidz1  3.69T  8.93T  3  0  9.42K610
c3t87d0   -  -  0  0  43.6K  1.50K
c3t88d0   -  -  0  0  43.9K  1.48K
c3t89d0   -  -  0  0  44.2K  1.49K
c3t90d0   -  -  0  0  43.4K  1.49K
c3t91d0   -  -  0  0  42.5K  1.48K
c3t92d0   -  -  0  0  44.5K  1.49K
c3t93d0   -  -  0  0  44.8K  1.49K
  raidz1  3.64T  8.99T  3  0  9.40K  3.94K
c3t94d0   -  -  0  0  31.9K  2.09K
c3t95d0   -  -  0  0  31.6K  2.09K
c3t96d0   -  -  0  0  30.8K  2.08K
c3t97d0   -  -  0  0  34.2K  2.08K
c3t98d0   -  -  0  0  34.4K  2.08K
c3t99d0   -  -  0  0  35.2K  2.09K
c3t100d0  -  -  0  0  34.9K  2.08K
  -  -  -  -  -  -

Also, the third raidz entry shows less write in bandwidth (610).  
This is actually the first time it's a non-zero value.


My last attempt to import it, was using this command:

zpool import -o failmode=panic -f -R /altmount backup

However it did not panic. As I mentioned in the first message, it  
mounts 189 filesystems and hangs on #190. While the command is  
hanging, I can use zfs mount to mount filesystems #191 and above  
(only one filesystem does not mount and causes the import procedure to  
hang).


Before trying the command above, I was using only zpool import  
backup, and the iostat output was showing ZERO for the third raidz  
from the list above (not sure if that means something, but it does  
look odd).


I'm really on a dead end here, any help is appreciated.

Thanks,
Eduardo Bragatto.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Time Slider in Solaris10

2010-05-10 Thread Mary Ellen Fitzpatrick

Oh.. thanks..
I did download the latest zfs-auto-snapshot:  
zfs-snapshot-0.11.2


Is there a more recent version? 


John Balestrini wrote:

I believe that Time Slider is just a front end for zfs-auto-snapshot.

John


On May 10, 2010, at 1:17 PM, Mary Ellen Fitzpatrick wrote:

  

Is Time Slider available in Solaris10?  Or just in Opensolaris?
I am running Solaris 10 5/09 s10x_u7wos_08 X86 and wanted to automate my snapshots. From reading blogs, seems zfs-auto-snapshot is obsolete and was/is being replaced by time-slider.  But I can not seem to find it for Solaris10. 
I do have a script/cron job that will work, but wanted to test out Time Slider


--
Thanks
Mary Ellen



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



  


--
Thanks
Mary Ellen


Mary Ellen FitzPatrick
Systems Analyst 
Bioinformatics

Boston University
24 Cummington St.
Boston, MA 02215
office 617-358-2771
cell 617-797-7856 
mfitz...@bu.edu


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size

2010-05-10 Thread Geoff Nordli


-Original Message-
From: Brandon High [mailto:bh...@freaks.com]
Sent: Monday, May 10, 2010 9:55 AM

On Sun, May 9, 2010 at 9:42 PM, Geoff Nordli geo...@gnaa.net wrote:
 I am looking at using 8K block size on the zfs volume.

8k is the default for zvols.


You are right, I didn't look at that property, and instead I was focused on
the record size property.  

 I was looking at the comstar iscsi settings and there is also a blk
 size configuration, which defaults as 512 bytes. That would make me
 believe that all of the IO will be broken down into 512 bytes which
 seems very inefficient.

I haven't done any tuning on my comstar volumes, and they're using 8k
blocks.
The setting is in the dataset's volblocksize parameter.

When I look at the stmfadm llift-lu -v  it shows me the block size of
512.  I am running NexentaCore 3.0 (b134+) .  I wonder if the default size
has changed with different versions.  


 It seems this value should match the file system allocation/cluster
 size in the VM, maybe 4K if you are using an ntfs file system.

You'll have more overhead using smaller volblocksize values, and get worse
compression (since compression is done on the block). If you have dedup
enabled, you'll create more entries in the DDT which can have pretty
disastrous
consequences on write performance.

Ensuring that your VM is block-aligned to 4k (or the guest OS's block
size) boundaries will help performance and dedup as well.

This is where I am probably the most confused l need to get straightened in
my mind.  I thought dedup and compression is done on the record level.  

As long as you are using a multiple of the file system block size, then
alignment shouldn't be a problem with iscsi based zvols.  When using a zvol
comstar stores the metadata in a zvol object; instead of the first part of
the volume. 

As Roy pointed out, you have to be careful on the record size because DDT
and L2ARC lists consuming lots of RAM.  

But it seems you have four things to look at:

File system block size - Iscsi blk size - zvol block size - zvol record
size.  

What is the relationship between iscsi blk size and zvol block size?

What is the relationship between zvol block size and zvol record size?

Thanks,

Geoff 








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hanging

2010-05-10 Thread Cindy Swearingen

Hi Eduardo,

Please use the following steps to collect more information:

1. Use the following command to get the PID of the zpool import process,
 like this:

# ps -ef | grep zpool

2. Use the actual PID of zpool import found in step 1 in the following
command, like this:

echo 0tPID of zpool import::pid2proc|::walk thread|::findstack | mdb -k

Then, send the output.

Thanks,

Cindy
On 05/10/10 14:22, Eduardo Bragatto wrote:

On May 10, 2010, at 4:46 PM, John Balestrini wrote:

Recently I had a similar issue where the pool wouldn't import and 
attempting to import it would essentially lock the server up. Finally 
I used pfexec zpool import -F pool1 and simply let it do it's thing. 
After almost 60 hours the imported finished and all has been well 
since (except my backup procedures have improved!).


Hey John,

thanks a lot for answering -- I already allowed the zpool import 
command to run from Friday to Monday and it did not complete -- I also 
made sure to start it using truss and literally nothing has happened 
during that time (the truss output file does not have anything new).


While the zpool import command runs, I don't see any CPU or Disk I/O 
usage. zpool iostat shows very little I/O too:


# zpool iostat -v
 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
backup31.4T  19.1T 11  2  29.5K  11.8K
  raidz1  11.9T   741G  2  0  3.74K  3.35K
c3t102d0  -  -  0  0  23.8K  1.99K
c3t103d0  -  -  0  0  23.5K  1.99K
c3t104d0  -  -  0  0  23.0K  1.99K
c3t105d0  -  -  0  0  21.3K  1.99K
c3t106d0  -  -  0  0  21.5K  1.98K
c3t107d0  -  -  0  0  24.2K  1.98K
c3t108d0  -  -  0  0  23.1K  1.98K
  raidz1  12.2T   454G  3  0  6.89K  3.94K
c3t109d0  -  -  0  0  43.7K  2.09K
c3t110d0  -  -  0  0  42.9K  2.11K
c3t111d0  -  -  0  0  43.9K  2.11K
c3t112d0  -  -  0  0  43.8K  2.09K
c3t113d0  -  -  0  0  47.0K  2.08K
c3t114d0  -  -  0  0  42.9K  2.08K
c3t115d0  -  -  0  0  44.1K  2.08K
  raidz1  3.69T  8.93T  3  0  9.42K610
c3t87d0   -  -  0  0  43.6K  1.50K
c3t88d0   -  -  0  0  43.9K  1.48K
c3t89d0   -  -  0  0  44.2K  1.49K
c3t90d0   -  -  0  0  43.4K  1.49K
c3t91d0   -  -  0  0  42.5K  1.48K
c3t92d0   -  -  0  0  44.5K  1.49K
c3t93d0   -  -  0  0  44.8K  1.49K
  raidz1  3.64T  8.99T  3  0  9.40K  3.94K
c3t94d0   -  -  0  0  31.9K  2.09K
c3t95d0   -  -  0  0  31.6K  2.09K
c3t96d0   -  -  0  0  30.8K  2.08K
c3t97d0   -  -  0  0  34.2K  2.08K
c3t98d0   -  -  0  0  34.4K  2.08K
c3t99d0   -  -  0  0  35.2K  2.09K
c3t100d0  -  -  0  0  34.9K  2.08K
  -  -  -  -  -  -

Also, the third raidz entry shows less write in bandwidth (610). 
This is actually the first time it's a non-zero value.


My last attempt to import it, was using this command:

zpool import -o failmode=panic -f -R /altmount backup

However it did not panic. As I mentioned in the first message, it mounts 
189 filesystems and hangs on #190. While the command is hanging, I can 
use zfs mount to mount filesystems #191 and above (only one filesystem 
does not mount and causes the import procedure to hang).


Before trying the command above, I was using only zpool import backup, 
and the iostat output was showing ZERO for the third raidz from the 
list above (not sure if that means something, but it does look odd).


I'm really on a dead end here, any help is appreciated.

Thanks,
Eduardo Bragatto.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hanging

2010-05-10 Thread Eduardo Bragatto

On May 10, 2010, at 6:28 PM, Cindy Swearingen wrote:


Hi Eduardo,

Please use the following steps to collect more information:

1. Use the following command to get the PID of the zpool import  
process,

like this:

# ps -ef | grep zpool

2. Use the actual PID of zpool import found in step 1 in the  
following

command, like this:

echo 0tPID of zpool import::pid2proc|::walk thread|::findstack |  
mdb -k


Then, send the output.


Hi Cindy,

first of all, thank you for taking your time to answer my question.  
Here's the output of the command you requested:


# echo 0t733::pid2proc|::walk thread|::findstack | mdb -k
stack pointer for thread 94e4db40: fe8000d3e5b0
[ fe8000d3e5b0 _resume_from_idle+0xf8() ]
  fe8000d3e5e0 swtch+0x12a()
  fe8000d3e600 cv_wait+0x68()
  fe8000d3e640 txg_wait_open+0x73()
  fe8000d3e670 dmu_tx_wait+0xc5()
  fe8000d3e6a0 dmu_tx_assign+0x38()
  fe8000d3e700 dmu_free_long_range_impl+0xe6()
  fe8000d3e740 dmu_free_long_range+0x65()
  fe8000d3e790 zfs_trunc+0x77()
  fe8000d3e7e0 zfs_freesp+0x66()
  fe8000d3e830 zfs_space+0xa9()
  fe8000d3e850 zfs_shim_space+0x15()
  fe8000d3e890 fop_space+0x2e()
  fe8000d3e910 zfs_replay_truncate+0xa8()
  fe8000d3e9b0 zil_replay_log_record+0x1ec()
  fe8000d3eab0 zil_parse+0x2ff()
  fe8000d3eb30 zil_replay+0xde()
  fe8000d3eb50 zfsvfs_setup+0x93()
  fe8000d3ebc0 zfs_domount+0x2e4()
  fe8000d3ecc0 zfs_mount+0x15d()
  fe8000d3ecd0 fsop_mount+0xa()
  fe8000d3ee00 domount+0x4d7()
  fe8000d3ee80 mount+0x105()
  fe8000d3eec0 syscall_ap+0x97()
  fe8000d3ef10 _sys_sysenter_post_swapgs+0x14b()

The first message from this thread has three files attached with  
information from truss (tracing zpool import), zdb output and the  
entire list of threads taken from 'echo ::threadlist -v | mdb -k'.


Thanks,
Eduardo Bragatto
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size

2010-05-10 Thread Brandon High
On Mon, May 10, 2010 at 1:53 PM, Geoff Nordli geo...@gnaa.net wrote:
 You are right, I didn't look at that property, and instead I was focused on
 the record size property.

zvols don't have a recordsize - That's a property of filesystem
datasets, not volumes.

 When I look at the stmfadm llift-lu -v  it shows me the block size of
 512.  I am running NexentaCore 3.0 (b134+) .  I wonder if the default size
 has changed with different versions.

I see what you're referring to. The iscsi block size, which is what
the LUN reports to initiator as it's block size, vs. the block size
written to disk.

Remember that up until very recently, most drives used 512 byte
blocks. Most OS expect a 512b block and make certain assumptions based
on that, which is probably why it's the default.

Ensuring that your VM is block-aligned to 4k (or the guest OS's block
size) boundaries will help performance and dedup as well.

 This is where I am probably the most confused l need to get straightened in
 my mind.  I thought dedup and compression is done on the record level.

It's at the record level for filesystems, block level for zvol.

 As long as you are using a multiple of the file system block size, then
 alignment shouldn't be a problem with iscsi based zvols.  When using a zvol
 comstar stores the metadata in a zvol object; instead of the first part of
 the volume.

There can be an off by one error which will cause small writes to
span blocks. If the data is not block aligned, then a 4k write causes
two read/modify/writes (on zfs two blocks have to be read then written
and block pointers updated) whereas an aligned write will not require
the existing data to be read. This is assuming that the zvol block
size = VM fs block size = 4k. In the case where the zvol block size is
a multiple of the VM fs block size (eg 4k VM fs, 8k zvol), then
writing one fs block will alway require a read for an aligned
filesystem, but could require two for an unaligned fs if the VM fs
block spans two zvol blocks.

There's been a lot of discussion about this lately with the
introduction of WD's 4k sector drives, since they have a 512b sector
emulation mode.

 What is the relationship between iscsi blk size and zvol block size?

There is none. iscsi block size is what the target LUN reports to
initiators. volblocksize is what size chunks are written to the pool.

 What is the relationship between zvol block size and zvol record size?

They are never both present on a dataset. volblocksize is only for
volumes, recordsize is only for filesystems. Both control the size of
the unit of data written to the pool. This unit of data is what the
checksum is calculated on, and what the compression and dedup are
performed on.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size

2010-05-10 Thread Brandon High
On Mon, May 10, 2010 at 3:53 PM, Geoff Nordli geo...@gnaa.net wrote:
 Doesn't this alignment have more to do with aligning writes to the
 stripe/segment size of a traditional storage array?  The articles I am

It is a lot like a stripe / segment size. If you want to think of it
in those terms, you've got a segment of 512b (the iscsi block size)
and a width of 16, giving you an 8k stripe size. Any write that is
less than 8k will require a RMW cycle, and any write in multiples of
8k will do full stripe writes. If the write doesn't start on an 8k
boundary, you risk having writes span multiple underlying zvol blocks.

There's an explanation of WD's Advanced Format at Anandtech that
describes the problem with 4k physical sectors, here
http://www.anandtech.com/show/2888. Instead of sector, think zvol
block though.

When using a zvol, you've essentially got $volblocksize sized physical
sectors, but the initiator sees the 512b block size that the LUN is
reporting. If you don't block align, you risk having a write straddle
two zfs blocks. There may be some benefit to using a 4k volblocksize,
but you'll use more time and space on block checksums and, etc in your
zpool. I think 8k is a reasonable trade off.

 reading suggests creating a small unused partition to take up the space up
 to 127bytes (assuming 128byte segment), then create the real partition from
 the 128th sector going forward.  I am not sure how this would happen with
 zfs.

If you're using the whole disk with zfs, you don't need to worry about
it. If you're using fdisk partitions or slices, you need be a little
more careful.

I made an attempt to 4k block align the SSD that I'm using for a slog
/ L2ARC, which in theory should line up better with the devices erase
boundary. While not really pertinent to this discussion it gives some
idea on how to do it.

You want the filesystem to start at a point where ( $offset *
$sector_size * $sectors_per_cylinder ) % 4096 = 0.

For most LBA drives, you've got 16065 sectors/cylinder and 512b
sectors, giving 8 as the smallest offset that will align.
( 8 * 512 * 16065 ) % 4096 = 0

First you have to look at fdisk (on an SMI labeled disk) and realize
that you're going to lose the first cylinder to the MBR. When you then
create slices in format, it'll report one cylinder less than fdisk
did, so remember to account for that in your offset.

For an iscsi LUN used by a VM, you should align its filesystem on a
zvol block boundary. Windows Vista and Server 2008 use 240 heads  63
sectors/track, so they are already 8k block aligned. Linux, Solaris,
and BSD also let you specify the geometry used by fdisk, but I wasn't
comfortable doing it with Solaris since you have to create a geometry
file first.

For my 30GB OCZ Vertex:

bh...@basestar:~$ pfexec fdisk -W - /dev/rdsk/c1t0d0p0
* /dev/rdsk/c1t0d0p0 default fdisk table
* Dimensions:
*512 bytes/sector
* 63 sectors/track
*255 tracks/cylinder
*   3892 cylinders
[..]
* IdAct  Bhead  Bsect  BcylEhead  Esect  EcylRsect  Numsect
  191   128  0  1  1   25463 102316065  62508915


bh...@basestar:~$ pfexec prtvtoc  /dev/rdsk/c1t0d0p0
* /dev/rdsk/c1t0d0p0 partition map
*
* Dimensions:
* 512 bytes/sector
*  63 sectors/track
* 255 tracks/cylinder
*   16065 sectors/cylinder
*3891 cylinders
*3889 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*   First SectorLast
*   Sector CountSector
*   0112455112454
*62428590 48195  62476784
*
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0  400 112455   2056320   2168774
   1  4012168775  60243750  62412524
   2  501  0  62508915  62508914
   8  101  0 16065 16064


-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR# 6574286, remove slog device

2010-05-10 Thread Moshe Vainer
Did the fix for 6733267 make it to 134a (2010.05)? It isn't marked fixed, and i 
couldn't find it anywhere in the changelogs. Does that mean we'll have to wait 
for 2010.11 (or whatever v+2 is named)?

Thanks,
Moshe
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Any experience with an OCZ Z-Drive R2 with ZFS

2010-05-10 Thread Richard PALO
After a rather fruitless non-committal exchange with OCZ, I'd like to know if 
there is any experience in this community with the OCZ Z-Drive... 

In particular, is it possible (and worthwhile) to put the device in jbod as 
opposed to raid-0 mode...   an entry-level flashfire f20 'sort' of card...  FYI 
the controller card is an LSI SAS1068e...

It would appear interesting, if feasible, to create a ZFS mirrored boot drive. 
The additional devices (mirrored or not) might conveniently serve as data or 
cache devices. 

Anybody have one that has tested this or would be willing to? 
(even a first generation card for that matter).

Thanks in advance
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss