Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread Auke Folkerts
On Wed, Dec 02, 2009 at 03:57:47AM -0800, Brian McKerr wrote:
 I previously had a linux NFS server that I had mounted 'ASYNC' and, as one 
 would expect, NFS performance was pretty good getting close to 900gb/s. Now 
 that I have moved to opensolaris,  NFS performance is not very good, I'm 
 guessing mainly due to the 'SYNC' nature of NFS.  I've seen various threads 
 and most point at 2 options;
 
 1. Disable the ZIL
 2. Add independent log device/s


We have experienced the same performance penalty using NFS over ZFS.  The 
issue is indeed caused by the synchronous nature of ZFS. More precisely, it
is caused by the fact that ZFS promises correct behaviour while eg. a linux
NFS server (using async) does not.  The issue is decribed in great detail
at http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine

If you want the same behaviour as you had with your Linux NFS server,
you can disable the ZIL.  Doing so should give the same guarantees as 
the linux NFS service. 

The big issue with disabling the ZIL is that it is system-wide. Although
it could be an acceptable tradeoff for one filesystem, it is not necesarily
a good system-wide setting. That is why I think the option to disable the 
ZIL should be per-filesystem (Which I think should be possible because
a ZIL is actually kept per-filesystem).

As for adding HDD's as ZIL-devices, I'd advise against it. We have tried
this and the performance decreased.  Using SSD's as the ZIL is probably
the way to go.  A final option is to accept the situation as it is, arguing
that you have traded performance for increased reliability.

Regards,
Auke
-- 
 Auke Folkerts 
 University of Amsterdam


pgp98O6FxZsbM.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
I would like to to concatenate N files into one big file taking advantage of 
ZFS copy-on-write semantics so that the file concatenation is done without 
actually copying any (large amount of) file content.
  cat f1 f2 f3 f4 f5  f15
Is this already possible when source and target are on the same ZFS filesystem?

Am looking into the ZFS source code to understand if there are sufficient 
(private) interfaces to make a simple zcat -o f15   f1 f2 f3 f4 f5 userland 
application in C code. Does anybody have advice on this?

TIA
Per
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Darren J Moffat

Per Baatrup wrote:

I would like to to concatenate N files into one big file taking advantage of 
ZFS copy-on-write semantics so that the file concatenation is done without 
actually copying any (large amount of) file content.
  cat f1 f2 f3 f4 f5  f15
Is this already possible when source and target are on the same ZFS filesystem?

Am looking into the ZFS source code to understand if there are sufficient (private) 
interfaces to make a simple zcat -o f15   f1 f2 f3 f4 f5 userland application 
in C code. Does anybody have advice on this?


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be 
made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Peter Tribble
On Thu, Dec 3, 2009 at 12:08 PM, Darren J Moffat
darr...@opensolaris.org wrote:
 Per Baatrup wrote:

 I would like to to concatenate N files into one big file taking advantage
 of ZFS copy-on-write semantics so that the file concatenation is done
 without actually copying any (large amount of) file content.
  cat f1 f2 f3 f4 f5  f15
 Is this already possible when source and target are on the same ZFS
 filesystem?

 Am looking into the ZFS source code to understand if there are sufficient
 (private) interfaces to make a simple zcat -o f15   f1 f2 f3 f4 f5
 userland application in C code. Does anybody have advice on this?

 The answer to this is likely deduplication which ZFS now has.

 The reason dedup should help here is that after the 'cat' f15 will be made
 up of blocks that match the blocks of f1 f2 f3 f4 f5.

Is that likely to happen? dedup is at the block level, so the blocks
in f2 will only
match the same data in f15 if they're aligned, which is only going to happen if
f1 ends on a block boundary.

Besides, you still have to read all the data off the disk, manipulate
it, and write
it all back.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, Darren J Moffat wrote:


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be made up 
of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Isn't this only true if the file sizes are such that the concatenated 
blocks are perfectly aligned on the same zfs block boundaries they 
used before?  This seems unlikely to me.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread mbr

Hello,

Edward Ned Harvey wrote:

Yes, I have SSD for ZIL.  Just one SSD.  32G.  But if this is the problem,
then you'll have the same poor performance on the local machine that you
have over NFS.  So I'm curious to see if you have the same poor performance
locally.  The ZIL does not need to be reliable; if it fails, the ZIL will
begin writing to the main storage, and performance will suffer until the new
SSD is put into production.


I am also planning to install a SSD as ZILlog. Is it really true that there
are no problems if the ZILlog fails and there is no mirror of the ZILlog?

What about the data that were on the ZILlog SSD at the time of failure, is
a copy of the data still in the machines memory from where it can be used
to put the transaction to the stable storage pool?

What if the machine reboots after the SSD has failed?
The ZFS Best Practices Guide commends to mirror the log:

 
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pool_Performance_Considerations

 Mirroring the log device is highly recommended.
 Protecting the log device by mirroring will allow you to access the storage
 pool even if a log device has failed. Failure of the log device may cause the
 storage pool to be inaccessible if you are running the Solaris Nevada release
 prior to build 96 and a release prior to the Solaris 10 10/09 release.
 For more information, see CR 6707530.

 http://bugs.opensolaris.org/view_bug.do?bug_id=6707530

No probs with that if I use Sol10U8?

Regrads,
Michael.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Erik Ableson
 On 3 déc. 2009, at 13:29, Bob Friesenhahn bfrie...@simple.dallas.tx.u 
s wrote:



On Thu, 3 Dec 2009, Darren J Moffat wrote:


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will  
be made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Isn't this only true if the file sizes are such that the  
concatenated blocks are perfectly aligned on the same zfs block  
boundaries they used before?  This seems unlikely to me.


It's also worth noting that if the block alignment works out for the  
dedup, the actual write traffic will be trivial, consisting only of  
pointer references, so the heavy lifting will be the read operations.


Much depends on the contents of the files. Fixed size binary blobs  
that align nicely with 16/32/64k boundaries, or variable sized text  
files.


Cordialement,

Erik Ableson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
dedup operates on the block level leveraging the existing FFS checksums. Read 
What to dedup: Files, blocks, or bytes here 
http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it will 
generate duplicate files so data read and writes could be avoided all together.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Michael Schuster

Per Baatrup wrote:

dedup operates on the block level leveraging the existing FFS
checksums. Read What to dedup: Files, blocks, or bytes here
http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it
will generate duplicate files so data read and writes could be avoided
all together.


you'd probably be better off avoiding zcat - it's been in use since 
almost forever, from the man-page:


  zcat
 The zcat utility writes to standard output the  uncompressed
 form  of  files that have been compressed using compress. It
 is the equivalent  of  uncompress-c.  Input  files  are  not
 affected.

:-)

cheers
Michael
--
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
zcat was my acronym for a special ZFS aware version of cat and the name was 
obviously a big mistake as I did not know it was an existing command and simply 
forgot to check.

Should rename if to zfscat or something similar?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Roland Rambau

gang,

actually a simpler version of that idea would be a zcp:

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data

  -- Roland

Per Baatrup schrieb:

dedup operates on the block level leveraging the existing FFS checksums. Read 
What to dedup: Files, blocks, or bytes here http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it will 
generate duplicate files so data read and writes could be avoided all together.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread michael schuster

Roland Rambau wrote:

gang,

actually a simpler version of that idea would be a zcp:

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data


I think they call it 'ln' ;-) and that even works on ufs.

Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Darren J Moffat

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, Darren J Moffat wrote:


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be 
made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Isn't this only true if the file sizes are such that the concatenated 
blocks are perfectly aligned on the same zfs block boundaries they used 
before?  This seems unlikely to me.


Yes that would be the case.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
Actually 'ln -s source target' would not be the same zcp source target as 
writing to the source file after the operation would change the target file as 
well where as for zcp this would only change the source file due to 
copy-on-write semantics of ZFS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, mbr wrote:


What about the data that were on the ZILlog SSD at the time of failure, is
a copy of the data still in the machines memory from where it can be used
to put the transaction to the stable storage pool?


The intent log SSD is used as 'write only' unless the system reboots, 
in which case it is used to support recovery.  The system memory is 
used as the write path in the normal case.  Once the data is written 
to the intent log, then the data is declared to be written as far as 
higher level applications are concerned.


If the intent log SSD fails and the system spontaneously reboots, then 
data may be lost.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread michael schuster

Per Baatrup wrote:

Actually 'ln -s source target' would not be the same zcp source target
as writing to the source file after the operation would change the
target file as well where as for zcp this would only change the source
file due to copy-on-write semantics of ZFS.


I actually was thinking of creating a hard link (without the -s option), 
but your point is valid for hard and soft links.


cheers
Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Seth

michael schuster wrote:

Roland Rambau wrote:

gang,

actually a simpler version of that idea would be a zcp:

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data


I think they call it 'ln' ;-) and that even works on ufs.

Michael

+1

More and more it sounds like an optimization that will either

A. not add much over dedup

or

B. have value only in specific situations - and completely misbehave in 
other situations (even the same situations after passage of time)


Why not just make a special-purpose application (completely user-land) 
for it? I know, 'ln' is remotely kin of this idea but, 'ln' is POSIX and 
people know what to expect.
What you'd practically need to do is whip up a vfs layer that exposes 
the underlying blocks of a filesystem and possibly name them by their 
SHA256 or MD5 hash. Then you'd need (another?) vfs abstraction that 
allows 'virtual' files to be assembled from these blocks in multiple 
independent chains.


I know there is already a fuse implementation of the first vfs driver 
(the name evades me, but I think it was something like chunkfs[1]) and 
one could at least whip up a reasonable read-only Proof-of-Concept of 
the second part.


The reason _I_ wouldn't do that is because, I'm already happy with e.g.:

   mkfifo /var/run/my_part_collector
   (while true; do cat /local/data/my_part_*  
/var/run/my_part_collector; done)

   wc -l /var/run/my_part_collector

The equivalent of this could be (better) expressed in C, perl or any 
language of your choice). I believe this is all POSIX.
  

[1] The reason this exists is obviously for backup and synchronization 
implementations: it will make it possible to backup files using rsync 
when the encryption key is not available to the backup process (with a 
EBC mode crypto algo); it should make it 'simple' to synchronize ones 
large monolythic files with e.g. Amazon S3 cloud storage etc. etc.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, Erik Ableson wrote:


Much depends on the contents of the files. Fixed size binary blobs that align 
nicely with 16/32/64k boundaries, or variable sized text files.


Note that the default zfs block size is 128K and so that will 
therefore be the default dedup block size.


Most files are less than 128K and occupy a short tail block so 
concatenating them will not usually enjoy the benefits of 
deduplication.


It is not wise to riddle zfs with many special-purpose features since 
zfs would then be encumbered by these many features, which tend to 
defeat future improvements.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Darren J Moffat

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, Erik Ableson wrote:


Much depends on the contents of the files. Fixed size binary blobs 
that align nicely with 16/32/64k boundaries, or variable sized text 
files.


Note that the default zfs block size is 128K and so that will therefore 
be the default dedup block size.


Most files are less than 128K and occupy a short tail block so 
concatenating them will not usually enjoy the benefits of deduplication.


Most ?  I think that is a bit of a sweeping statement.  In know of some 
environments where most files are multiple gigabytes in size and 
others where 1K is the upper bound of the file system.


So I don't think you can say at all that Most files are  128K.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Jason King
On Thu, Dec 3, 2009 at 9:58 AM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Thu, 3 Dec 2009, Erik Ableson wrote:

 Much depends on the contents of the files. Fixed size binary blobs that
 align nicely with 16/32/64k boundaries, or variable sized text files.

 Note that the default zfs block size is 128K and so that will therefore be
 the default dedup block size.

 Most files are less than 128K and occupy a short tail block so concatenating
 them will not usually enjoy the benefits of deduplication.

 It is not wise to riddle zfs with many special-purpose features since zfs
 would then be encumbered by these many features, which tend to defeat future
 improvements.

Well it could be done in a way such that it could be fs-agnostic
(perhaps extending /bin/cat with a new flag such as -o outputfile, or
detecting if stdout is a file vs tty, though corner cases might get
tricky).   If a particular fs supported such a feature, it could take
advantage of it, but if it didn't, it could fall back to doing a
read+append.  Sort of like how mv figures out if the source  target
are the same or different filesystems and acts accordingly.

There are a few use cases I've encountered where having this would
have been _very_ useful (usually when trying to get large crashdumps
to Sun quickly).  In general, it would allow one to manipulate very
large files by breaking them up into smaller subsets while still
having the end result be a single file.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread mbr

Hello,

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, mbr wrote:


What about the data that were on the ZILlog SSD at the time of 
failure, is

a copy of the data still in the machines memory from where it can be used
to put the transaction to the stable storage pool?


The intent log SSD is used as 'write only' unless the system reboots, in 
which case it is used to support recovery.  The system memory is used as 
the write path in the normal case.  Once the data is written to the 
intent log, then the data is declared to be written as far as higher 
level applications are concerned.


thank you Bob for the clarification.
So I don't need a mirrored ZILlog for security reasons, all the information
is still in memory and will be used from there by default if only the ZILlog
SSD fails.

If the intent log SSD fails and the system spontaneously reboots, then 
data may be lost.


I can live with the data loss as long as the machine comes up with the faulty
ZILlog SSD but otherwise without probs and with a clean zpool.

Has the following error no consequences?

 Bug ID 6538021
 Synopsis   Need a way to force pool startup when zil cannot be replayed
 State  3-Accepted (Yes, that is a problem)
 Link   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021

Michael.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, Jason King wrote:


Well it could be done in a way such that it could be fs-agnostic
(perhaps extending /bin/cat with a new flag such as -o outputfile, or
detecting if stdout is a file vs tty, though corner cases might get
tricky).   If a particular fs supported such a feature, it could take
advantage of it, but if it didn't, it could fall back to doing a
read+append.  Sort of like how mv figures out if the source  target
are the same or different filesystems and acts accordingly.


The most common way that I concatenate files into a larger file is by 
using a utility such as 'tar', which outputs a different format.  I 
rarely use 'cat' to concatenate files.


It is desired to concatenate files in a way which works best for 
deduplication then a tar-like format can be invented which takes care 
to always start new file output on a filesystem block boundary.  With 
zfs deduplication this should be faster and take less space than 
compressing the entire result as long as the ouput is stored in the 
same pool.  If output is written to a destination filesystem which 
uses a different block size, then the ideal block size will be that of 
the destination filesystem so that large archive files can still be 
usefull deduplicated.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, mbr wrote:


Has the following error no consequences?

Bug ID 6538021
Synopsis   Need a way to force pool startup when zil cannot be replayed
State  3-Accepted (Yes, that is a problem)
Link 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021


I don't know the status of this but it does make sense to require the 
user to explicitly corrupt/lose data in the storage pool.  It could be 
that the log device is just temporarily missing and can be restored so 
zfs should not do this by default.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread Neil Perrin



On 12/03/09 09:21, mbr wrote:

Hello,

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, mbr wrote:


What about the data that were on the ZILlog SSD at the time of 
failure, is
a copy of the data still in the machines memory from where it can be 
used

to put the transaction to the stable storage pool?


The intent log SSD is used as 'write only' unless the system reboots, 
in which case it is used to support recovery.  The system memory is 
used as the write path in the normal case.  Once the data is written 
to the intent log, then the data is declared to be written as far as 
higher level applications are concerned.


thank you Bob for the clarification.
So I don't need a mirrored ZILlog for security reasons, all the information
is still in memory and will be used from there by default if only the 
ZILlog SSD fails.


Mirrored log devices are advised to improve reliablity. As previously mentioned,
if during writing a log device fails or is temporarily full then we use the 
main pool
devices to chain the log blocks. If we get read errors when trying to replay the
intent log (after a crash/power fail) then the admin is given the option to 
ignore
the log and continue or somehow fix the device (eg re-attach) and then retry.
Multiple log devices would provide extra reliability here.
We do not look in memory for the log records if we can't get the records
from the log blocks.



If the intent log SSD fails and the system spontaneously reboots, then 
data may be lost.


I can live with the data loss as long as the machine comes up with the 
faulty ZILlog SSD but otherwise without probs and with a clean zpool.


The log records are not required for consistency of the pool (it's not a 
journal).



Has the following error no consequences?

 Bug ID 6538021
 Synopsis   Need a way to force pool startup when zil cannot be replayed
 State  3-Accepted (Yes, that is a problem)
 Link   
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021


Er that bug should probably be closed as a duplicate.
We now have this functionality.



Michael.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Roland Rambau

Michael,

michael schuster schrieb:

Roland Rambau wrote:

gang,

actually a simpler version of that idea would be a zcp:

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data


I think they call it 'ln' ;-) and that even works on ufs.


quite similar but with a critical difference:

with hard links any modifications through either link are
seen by both links, since it stays a single file (note that
editors like vi do an implicit cp, they do NOT update the
original file )

That zcp ( actually it should be just a feature of 'cp' )
would be blockwise copy-on-write. It would have exactly
the same semantics as cp but just avoid any data movement,
since we can easily predict what the effect of a cp followed
by a dedup should be.

  -- Roland




--

**
Roland Rambau Platform Technology Team
Principal Field Technologist  Global Systems Engineering
Phone: +49-89-46008-2520  Mobile:+49-172-84 58 129
Fax:   +49-89-46008-  mailto:roland.ram...@sun.com
**
Sitz der Gesellschaft: Sun Microsystems GmbH,
Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028;  Geschäftsführer:
Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates:   Martin Häring
*** UNIX * /bin/sh  FORTRAN **
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL corrupt, not recoverable even with logfix

2009-12-03 Thread Anon Y Mous
Was the zpool originally created by a FreeBSD operating system or by an 
OpenSolaris operating system? If so, what version of FreeBSD, SXCE, OpenSolaris 
Indiana was it originally created by? The reason I'm asking this is because 
there are different versions of ZFS in different versions of OpenSolaris, so if 
you take a newer version zpool and try to mount it in an older version 
OpenSolaris, it won't mount.

The last time I tried it a long time ago, ZFS in FreeBSD was pretty unstable 
and still under heavy development, which was the sole reason I migrated my 
storage server with my important data on it to OpenSolaris, and it has been 
rock solid stable since.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import - device names not always updated?

2009-12-03 Thread Cindy Swearingen

Hi Ragnar,

A bug might exist but you are building a pool based on the ZFS
volumes that are created in another pool. This configuration
is not supported and possible deadlocks can occur.

If you can retry this example without building a pool on another
pool, like using files to create a pool and can reproduce this,
then please let me know.

Thanks,

Cindy

On 12/01/09 17:57, Ragnar Sundblad wrote:

It seems that device names aren't always updated when importing
pools if devices have moved. I am not sure if this is only an
cosmetic issue or if it could actually be a real problem -
could it lead to the device not being found at a later import?

/ragge

(This is on snv_127.)

I ran the following script:

#!/bin/bash

set -e
set -x

zfs create -V 1G rpool/vol1
zfs create -V 1G rpool/vol2
zpool create pool mirror /dev/zvol/dsk/rpool/vol1 /dev/zvol/dsk/rpool/vol2
zpool status pool
zpool export pool
zfs create rpool/subvol1
zfs create rpool/subvol2
zfs rename rpool/vol1 rpool/subvol1/vol1
zfs rename rpool/vol2 rpool/subvol2/vol2
zpool import -d /dev/zvol/dsk/rpool/subvol1
sleep 1
zpool import -d /dev/zvol/dsk/rpool/subvol2
sleep 1
zpool import -d /dev/zvol/dsk/rpool/subvol1 pool
zpool status pool


And got the output below. I have annotated it with ### remarks.


# bash zfs-test.bash
+ zfs create -V 1G rpool/vol1
+ zfs create -V 1G rpool/vol2
+ zpool create pool mirror /dev/zvol/dsk/rpool/vol1 /dev/zvol/dsk/rpool/vol2
+ zpool status pool
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
pool  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
/dev/zvol/dsk/rpool/vol1  ONLINE   0 0 0
/dev/zvol/dsk/rpool/vol2  ONLINE   0 0 0

errors: No known data errors
+ zpool export pool
+ zfs create rpool/subvol1
+ zfs create rpool/subvol2
+ zfs rename rpool/vol1 rpool/subvol1/vol1
+ zfs rename rpool/vol2 rpool/subvol2/vol2
+ zpool import -d /dev/zvol/dsk/rpool/subvol1
  pool: pool
id: 13941781561414544058
 state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
   see: http://www.sun.com/msg/ZFS-8000-2Q
config:

pool  DEGRADED
  mirror-0DEGRADED
/dev/zvol/dsk/rpool/subvol1/vol1  ONLINE
/dev/zvol/dsk/rpool/vol2  UNAVAIL  cannot open
### Note that it can't find vol2 - which is expected.
+ sleep 1
### The sleep here seems to be necessary for vol1 to magically be
### found in the next zpool import.
+ zpool import -d /dev/zvol/dsk/rpool/subvol2
  pool: pool
id: 13941781561414544058
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

pool  ONLINE
  mirror-0ONLINE
/dev/zvol/dsk/rpool/vol1  ONLINE
/dev/zvol/dsk/rpool/subvol2/vol2  ONLINE
### Note that it says vol1 is ONLINE, under it's old path, though it actually 
has moved
+ sleep 1
+ zpool import -d /dev/zvol/dsk/rpool/subvol1 pool
+ zpool status pool
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
pool  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
/dev/zvol/dsk/rpool/subvol1/vol1  ONLINE   0 0 0
/dev/zvol/dsk/rpool/vol2  ONLINE   0 0 0

errors: No known data errors
### Note that vol2 has it old path shown!



### Interestingly, if you then
+ zpool export pool
+ zpool import -d /dev/zvol/dsk/rpool/subvol2 pool
### vol2's path gets updated too:
+ zpool status pool
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
pool  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
/dev/zvol/dsk/rpool/subvol1/vol1  ONLINE   0 0 0
/dev/zvol/dsk/rpool/subvol2/vol2  ONLINE   0 0 0

errors: No known data errors



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
Roland,

Clearly an extension of cp would be very nice when managing large files.
Today we are relying heavily on snapshots for this, but this requires disipline 
on storing files in separate zfs'es avioding to snapshot too many files that 
changes frequently.

The reason I was speaking about cat in stead of cp is that in addition to 
copying a single file I would like also to concatenate several files into a 
single file. Can this be accomplished with your (z)cp?

--Per
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] L2ARC in clusters

2009-12-03 Thread Robert Milkowski

Hi,

When deploying ZFS in cluster environment it would be nice to be able 
to have some SSDs as local drives (not on SAN) and when pool switches 
over to the other node zfs would pick up the node's local disk drives as 
L2ARC.


To better clarify what I mean lets assume there is a 2-node cluster with 
1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now lets 
assume one zfs pool would be created on top of a lun exported from 2540. 
Now 4x local SSDs could be added as L2ARC but because they are not 
visible on a 2nd node when cluster does failover it should be able to 
pick up the ssd's which are local to the other node.


L2ARC doesn't contain any data which is critical to pool so it doesn't 
have to be shared between node. SLOG would be a whole different story 
and generally it wouldn't be possible. But L2ARC should be.



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in clusters

2009-12-03 Thread Robert Milkowski

Robert Milkowski wrote:

Hi,

When deploying ZFS in cluster environment it would be nice to be able 
to have some SSDs as local drives (not on SAN) and when pool switches 
over to the other node zfs would pick up the node's local disk drives 
as L2ARC.


To better clarify what I mean lets assume there is a 2-node cluster 
with 1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now lets 
assume one zfs pool would be created on top of a lun exported from 
2540. Now 4x local SSDs could be added as L2ARC but because they are 
not visible on a 2nd node when cluster does failover it should be able 
to pick up the ssd's which are local to the other node.


L2ARC doesn't contain any data which is critical to pool so it doesn't 
have to be shared between node. SLOG would be a whole different story 
and generally it wouldn't be possible. But L2ARC should be.





Perhaps a scenario like below should be allowed:

node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 node-1-ssd4
node-1# zpool export mysql
node-2# zpool import mysql
node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 node-2-ssd4


This is assuming that pool can be imported when some of its slog devices 
are not accessible.
That way the pool always would have some L2ARC/SSDs not accessible but 
would provide L2ARC cache on each node with local SSDs.



btw:


mi...@r600:/rpool/tmp# mkfile 200m f1
mi...@r600:/rpool/tmp# mkfile 100m s1
mi...@r600:/rpool/tmp# zpool create test /rpool/tmp/f1
mi...@r600:/rpool/tmp# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
test ONLINE   0 0 0
  /rpool/tmp/f1  ONLINE   0 0 0

errors: No known data errors
mi...@r600:/rpool/tmp# zpool add test cache /rpool/tmp/s1
cannot add to 'test': cache device must be a disk or disk slice
mi...@r600:/rpool/tmp#


is there a reason why a cache device can't be set-up on a file like for 
other vdevs?


--

Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in clusters

2009-12-03 Thread Robert Milkowski

Robert Milkowski wrote:

Robert Milkowski wrote:

Hi,

When deploying ZFS in cluster environment it would be nice to be able 
to have some SSDs as local drives (not on SAN) and when pool switches 
over to the other node zfs would pick up the node's local disk drives 
as L2ARC.


To better clarify what I mean lets assume there is a 2-node cluster 
with 1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now 
lets assume one zfs pool would be created on top of a lun exported 
from 2540. Now 4x local SSDs could be added as L2ARC but because they 
are not visible on a 2nd node when cluster does failover it should be 
able to pick up the ssd's which are local to the other node.


L2ARC doesn't contain any data which is critical to pool so it 
doesn't have to be shared between node. SLOG would be a whole 
different story and generally it wouldn't be possible. But L2ARC 
should be.





Perhaps a scenario like below should be allowed:

node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 
node-1-ssd4

node-1# zpool export mysql
node-2# zpool import mysql
node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 
node-2-ssd4



This is assuming that pool can be imported when some of its slog 
devices are not accessible.
That way the pool always would have some L2ARC/SSDs not accessible but 
would provide L2ARC cache on each node with local SSDs.



btw:


mi...@r600:/rpool/tmp# mkfile 200m f1
mi...@r600:/rpool/tmp# mkfile 100m s1
mi...@r600:/rpool/tmp# zpool create test /rpool/tmp/f1
mi...@r600:/rpool/tmp# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
test ONLINE   0 0 0
  /rpool/tmp/f1  ONLINE   0 0 0

errors: No known data errors
mi...@r600:/rpool/tmp# zpool add test cache /rpool/tmp/s1
cannot add to 'test': cache device must be a disk or disk slice
mi...@r600:/rpool/tmp#


is there a reason why a cache device can't be set-up on a file like 
for other vdevs?




mi...@r600:/rpool/tmp# zfs create -V 100m rpool/tmp/ssd1
mi...@r600:/rpool/tmp# zpool add test cache /dev/zvol/rdsk/rpool/tmp/ssd1
cannot use '/dev/zvol/rdsk/rpool/tmp/ssd1': must be a block device or 
regular file

mi...@r600:/rpool/tmp# zpool add test cache /dev/zvol/dsk/rpool/tmp/ssd1
mi...@r600:/rpool/tmp#


So when I try to add a cache device on-top of a file I get an error that 
a cache device must be a disk or a disk slice, so when I try to add a 
cache device on a rdsk I get an error that it bust be a block device or 
regular file which suggest a regular file should work... (dsk works fine).


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread A Darren Dunham
On Thu, Dec 03, 2009 at 09:36:23AM -0800, Per Baatrup wrote:

 The reason I was speaking about cat in stead of cp is that in
 addition to copying a single file I would like also to concatenate
 several files into a single file. Can this be accomplished with your
 (z)cp?

Unless you have special data formats, I think it's unlikely that the
last ZFS block in the file will be exactly full.  But to append without
copying, you'd need some way of ignoring a portion of the data in a
non-final ZFS block and stitching together the bytestream.  I don't
think that's possible with the ZFS layout.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in clusters

2009-12-03 Thread Robert Milkowski

Robert Milkowski wrote:

Robert Milkowski wrote:

Hi,

When deploying ZFS in cluster environment it would be nice to be able 
to have some SSDs as local drives (not on SAN) and when pool switches 
over to the other node zfs would pick up the node's local disk drives 
as L2ARC.


To better clarify what I mean lets assume there is a 2-node cluster 
with 1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now 
lets assume one zfs pool would be created on top of a lun exported 
from 2540. Now 4x local SSDs could be added as L2ARC but because they 
are not visible on a 2nd node when cluster does failover it should be 
able to pick up the ssd's which are local to the other node.


L2ARC doesn't contain any data which is critical to pool so it 
doesn't have to be shared between node. SLOG would be a whole 
different story and generally it wouldn't be possible. But L2ARC 
should be.





Perhaps a scenario like below should be allowed:

node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 
node-1-ssd4

node-1# zpool export mysql
node-2# zpool import mysql
node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 
node-2-ssd4



This is assuming that pool can be imported when some of its slog 
devices are not accessible.
That way the pool always would have some L2ARC/SSDs not accessible but 
would provide L2ARC cache on each node with local SSDs.


Actually it looks like it already works like that!
A pool imports with its cache device unavailable just fine.
Then I added another cache device. And I can still import it with the 
first one available but not the 2nd one.


zpool status complains of course but other than that it seems to be 
working fine.


Any thought?


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] L2ARC re-uses new device if it is in the same place

2009-12-03 Thread Robert Milkowski

Hi,


mi...@r600:/rpool/tmp# zpool status test
 pool: test
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   test ONLINE   0 0 0
 /rpool/tmp/f1  ONLINE   0 0 0

errors: No known data errors

lets add a cache device:

mi...@r600:/rpool/tmp# zfs create -V 100m rpool/tmp/ssd2
mi...@r600:/rpool/tmp# zpool add test cache /dev/zvol/dsk/rpool/tmp/ssd2
mi...@r600:/rpool/tmp# zpool status test
 pool: test
state: ONLINE
scrub: none requested
config:

   NAMESTATE READ WRITE CKSUM
   testONLINE   0 0 0
 /rpool/tmp/f1 ONLINE   0 0 0
   cache
 /dev/zvol/dsk/rpool/tmp/ssd2  ONLINE   0 0 0

errors: No known data errors
mi...@r600:/rpool/tmp#

now lets export the pool, re-create the zvol and then import the pool again:


mi...@r600:/rpool/tmp# zpool export test
mi...@r600:/rpool/tmp# zfs destroy rpool/tmp/ssd2
mi...@r600:/rpool/tmp# zfs create -V 100m rpool/tmp/ssd2
mi...@r600:/rpool/tmp# zpool import -d /rpool/tmp/ test


mi...@r600:/rpool/tmp# zpool status test
 pool: test
state: ONLINE
scrub: none requested
config:

   NAMESTATE READ WRITE CKSUM
   testONLINE   0 0 0
 /rpool/tmp/f1 ONLINE   0 0 0
   cache
 /dev/zvol/dsk/rpool/tmp/ssd2  ONLINE   0 0 0

errors: No known data errors
mi...@r600:/rpool/tmp#


No complaint here...
I'm not entirely sure that it should behave that way - in some 
circumstances it could be risky.
For example what if zvol/ssd/disk which is used on one server as a cache 
device has the same path on another server and then a pool is imported 
there? Would l2arc just blindly start using it as a cache device and 
overwriting some other data?


Shouldn't l2arc devices have a label/signature or at least use uuid of a 
disk and during import be checked if it is the same device? Or maybe it 
does and there is some other issue here with re-creating zvol...


btw: x86, snv_127

--
Robert Milkowski
http://milek.blogspot.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Nicolas Williams
On Thu, Dec 03, 2009 at 03:57:28AM -0800, Per Baatrup wrote:
 I would like to to concatenate N files into one big file taking
 advantage of ZFS copy-on-write semantics so that the file
 concatenation is done without actually copying any (large amount of)
 file content.
   cat f1 f2 f3 f4 f5  f15
 Is this already possible when source and target are on the same ZFS
 filesystem?
 
 Am looking into the ZFS source code to understand if there are
 sufficient (private) interfaces to make a simple zcat -o f15   f1 f2
 f3 f4 f5 userland application in C code. Does anybody have advice on
 this?

There have been plenty of answers already.

Quite aside from dedup, the fact that all blocks in a file must have the
same uncompressed size means that if any of f2..f5 have different block
sizes from f1, or any of f1..f5's last blocks are partial then ZFS could
not perform this concatenation as efficiently as you wish.

In other words: dedup _is_ what you're looking for...

...but also ZFS most likely could not do any better with any other, more
specific non-dedup solution.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] possible mega_sas issue sol10u8 (Re: Workaround for mpt timeouts in snv_127)

2009-12-03 Thread Tru Huynh
follow up, another crash today.

On Mon, Nov 30, 2009 at 11:35:07AM +0100, Tru Huynh wrote:
 1) OS
 SunOS xargos.bis.pasteur.fr 5.10 Generic_141445-09 i86pc i386 i86pc
 
 it's only sharing though NFS v3 to linux clients running
 20x CentOS-5 x86_64 2.6.18-164.6.1.el5 x86_64/i386
 78x CentOS-3 x86_64/ia32e/i386
 
 2) usual logs:
  /var/adm/messages
 - nothing
still empty

 
 3) fmdump -ev
 /var/fm/fmd/errlog is empty
same

 7) not tried yet
 reboot -d to force a dump
failed (not returned from sync)
reboot -dfn
failed at 98% of the dump (I could not catch the reason, screen blanked too 
fast)
 
 9) from the #irc channel, I will keep a screen running with:
[...@xargos ~]$ ps -ef
 UID   PID  PPID   CSTIME TTY TIME CMD
root 0 0   0   Nov 29 ?   3:16 sched
root 1 0   0   Nov 29 ?   0:00 /sbin/init
root 2 0   0   Nov 29 ?   0:00 pageout
root 3 0   0   Nov 29 ?  20:04 fsflush
root   154 1   0   Nov 29 ?   0:00 /usr/lib/picl/picld
root 7 1   0   Nov 29 ?   0:04 /lib/svc/bin/svc.startd
root 9 1   0   Nov 29 ?   0:08 /lib/svc/bin/svc.configd
  daemon   152 1   0   Nov 29 ?   0:03 /usr/lib/crypto/kcfd
 tru  2258  2226   0   Nov 30 pts/7   0:00 /usr/bin/bash
root   409   408   0   Nov 29 ?   0:00 /usr/sadm/lib/smc/bin/smcboot
root   142 1   0   Nov 29 ?   0:01 /usr/lib/sysevent/syseventd
root   429 1   0   Nov 29 ?   0:00 sh 
/opt/MegaRaidStorageManager/Framework/startup.sh
root57 1   0   Nov 29 ?   0:00 /sbin/dhcpagent
root64 1   0   Nov 29 ?   0:00 devfsadmd
root   208 1   0   Nov 29 ?   0:00 /lib/svc/method/iscsid
  daemon   306 1   0   Nov 29 ?   0:00 /usr/sbin/rpcbind
root   146 1   0   Nov 29 ?   0:12 /usr/sbin/nscd
root  2228  2226   0   Nov 30 pts/2   0:08 zpool iostat -v 60
root   332 7   0   Nov 29 ?   0:00 /usr/lib/saf/sac -t 300
root   145 1   0   Nov 29 ?   0:00 /usr/lib/power/powerd
root   226 1   0   Nov 29 ?   0:10 /usr/lib/inet/xntpd
root   394   332   0   Nov 29 ?   0:00 /usr/lib/saf/ttymon
root   262 1   0   Nov 29 ?   0:00 /usr/sbin/cron
root   366 1   0   Nov 29 ?   0:00 /usr/lib/utmpd
noaccess   673 1   0   Nov 29 ?   3:04 /usr/java/bin/java -server 
-Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
root   349 7   0   Nov 29 console 0:00 /usr/lib/saf/ttymon -g -d 
/dev/console -l console -m ldterm,ttcompat -h -p xarg
  daemon   315 1   0   Nov 29 ?   0:00 /usr/lib/nfs/statd
  daemon   317 1   0   Nov 29 ?   0:01 /usr/lib/nfs/nfsmapid
root   552 1   0   Nov 29 ?   0:01 /usr/sfw/sbin/snmpd
  daemon   324 1   0   Nov 29 ?   0:00 /usr/lib/nfs/lockd
root   431 1   0   Nov 29 ?   0:05 /usr/sbin/syslogd
 tru   695   689   0   Nov 29 pts/1   0:00 -bash
root   367 1   0   Nov 29 ?   0:00 /usr/lib/autofs/automountd
root   365 1   0   Nov 29 ?   0:02 /usr/lib/inet/inetd start
root   369   367   0   Nov 29 ?   0:01 /usr/lib/autofs/automountd
root   430   429   0   Nov 29 ?   3:26 ../jre/bin/java -classpath 
../jre/lib/rt.jar:../jre/lib/jsse.jar:../jre/lib/jce
root   408 1   0   Nov 29 ?   0:00 /usr/sadm/lib/smc/bin/smcboot
root   410   408   0   Nov 29 ?   0:00 /usr/sadm/lib/smc/bin/smcboot
root  2234  2226   0   Nov 30 pts/5   0:15 intrstat 60
 tru  2236  2226   0   Nov 30 pts/6   0:06 vmstat 60
 tru   689   688   0   Nov 29 ?   0:01 /usr/lib/ssh/sshd
root   594 1   0   Nov 29 ?   4:25 /usr/sbin/lsi_mrdsnmpagent 
-c /etc/sma/snmp/snmpd.conf
 tru  2232  2226   0   Nov 30 pts/4   0:13 prstat 60
 tru  2225   695   0   Nov 30 pts/1   0:01 screen
root   443 1   0   Nov 29 ?   0:00 /usr/lib/ssh/sshd
root   688   443   0   Nov 29 ?   0:00 /usr/lib/ssh/sshd
root   541 1   0   Nov 29 ?   0:03 /usr/lib/sendmail -bd -q15m 
-C /etc/mail/local.cf
   smmsp   537 1   0   Nov 29 ?   0:00 /usr/lib/sendmail -Ac -q15m
root  2565 1   0   Nov 30 ?   0:06 /usr/local/bin/mrmonitord
 tru  2226  2225   0   Nov 30 ?   0:05 screen
 tru  3988  3982   0 15:33:51 pts/11  0:00 prstat
root   498 1   0   Nov 29 ?   0:00 /usr/sbin/vold -f 
/etc/vold.conf
root   509 1   0   Nov 29 ?   0:04 /usr/lib/fm/fmd/fmd
 tru  2230  2226   0   Nov 30 pts/3   0:06 iostat -xn 60
 tru  3967  3966   0 15:33:36 ?   0:00 /usr/lib/ssh/sshd
root   522 1   0   Nov 29 ?   0:00 /usr/lib/nfs/mountd
  daemon   524 1   0   Nov 29 ?   9:45 /usr/lib/nfs/nfsd
root 

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Roland Rambau

Per,

Per Baatrup schrieb:

Roland,

Clearly an extension of cp would be very nice when managing large files.
Today we are relying heavily on snapshots for this, but this requires disipline 
on storing files in separate zfs'es avioding to snapshot too many files that 
changes frequently.

The reason I was speaking about cat in stead of cp is that in addition to copying a 
single file I would like also to concatenate several files into a single file. Can this be accomplished with 
your (z)cp?


No - zcp is a simpler case than what you proposed, and thats why
I pointed it out as a discussion case.  ( And it is clearly NOT
the same as 'ln'. )

Btw. I would be surprised to hear that this can be implemented
with current APIs;  you would need a call like (my fantasy here)
write_existing_block() where the data argument is not a pointer
to a buffer in memory but instead a reference to an already existing
data block in the pool. Based on such a call ( and a corresponding one
for read that returns those references in the pool ) IMHO an implementation
of the commands would be straight forward ( the actual work would be
in the implementation of those calls ).

This can certainly been done - I just doubt it already exists.

  -- Roland


--

**
Roland Rambau Platform Technology Team
Principal Field Technologist  Global Systems Engineering
Phone: +49-89-46008-2520  Mobile:+49-172-84 58 129
Fax:   +49-89-46008-  mailto:roland.ram...@sun.com
**
Sitz der Gesellschaft: Sun Microsystems GmbH,
Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028;  Geschäftsführer:
Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates:   Martin Häring
*** UNIX * /bin/sh  FORTRAN **
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] EON ZFS Storage 0.59.5 based on snv 125 released!

2009-12-03 Thread Andre Lue
Embedded Operating system/Networking (EON), RAM based live ZFS NAS appliance is 
released on Genunix! Many thanks to Al Hopper and Genunix.org for download 
hosting and serving the opensolaris community.

EON ZFS storage is available in a 32/64-bit CIFS and Samba versions:
tryitEON 64-bit x86 CIFS ISO image version 0.59.5 based on snv_125

* eon-0.595-125-64-cifs.iso
* MD5: a21c0b6111803f95c29e421af96ee016
* Size: ~90Mb
* Released: Thursday 3-December-2009

tryitEON 64-bit x86 Samba ISO image version 0.59.5 based on snv_125

* eon-0.595-125-64-smb.iso
* MD5: 4678298f0152439867d218987c3ec20e
* Size: ~103Mb
* Released: Thursday 3-December-2009

tryitEON 32-bit x86 CIFS ISO image version 0.59.5 based on snv_125

* eon-0.595-125-32-cifs.iso
* MD5: 4b76893c3363d46fad34bf7d0c23548c
* Size: ~57Mb
* Released: Thursday 3-December-2009

tryitEON 32-bit x86 Samba ISO image version 0.59.5 based on snv_125

* eon-0.595-125-32-smb.iso
* MD5: f478a8ea9228f16dc1bd93adae03d200
* Size: ~70Mb
* Released: Thursday 3-December-2009

tryitEON 64-bit x86 CIFS ISO image version 0.59.5 based on snv_125 (NO HTTP)

* eon-0.595-125-64-cifs-min.iso
* MD5: c7b9ec5c487302c1aa97363eb440fe00
* Size: ~85Mb
* Released: Thursday 3-December-2009

tryitEON 64-bit x86 Samba ISO image version 0.59.5 based on snv_125 (NO HTTP)

* eon-0.595-125-64-smb-min.iso
* MD5: a33f34506f05070ffc554de7beaafd4d
* Size: ~98Mb
* Released: Thursday 3-December-2009

New/Changes/Fixes:
- removed iscsitgd and replaced it with COMSTAR (iscsit, stmf)
- added SUNWhd to image vs being in the binary kit.
- added rsync to image vs being in the binary kit.
- added nge, yge and yukonx drivers.
- added (/etc/inet/hosts, /etc/default/init) to /mnt/eon0/.backup (TIMEZONE and 
hostname change fix)
- fixed typo entry /mnt/eon0/.exec zpool -a to zpool import -a
- eon rebooting at grub(since snv_122) in ESXi, Fusion and various versions of 
VMware workstation. This is related to bug 6820576. Workaround, at grub press e 
and add on the end of the kernel line -B disable-pcieb=true
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-12-03 Thread Trevor Pretty





Just thought I would let everybody know I saw one at a local ISP
yesterday. They hadn't started testing the metal had only arrived the
day before and they where waiting for the drives to arrive. They had
also changed the design to give it more network. I will try to find out
more as the customer progresses.


Interesting blog:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/







-- 



Trevor Pretty 
| Technical Account Manager
|
T: +64 9 639 0652 |
M: +64 21 666 161

Eagle Technology Group Ltd. 
Gate D, Alexandra Park, Greenlane West, Epsom

Private Bag 93211, Parnell, Auckland




www.eagle.co.nz
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
if any of f2..f5 have different block sizes from f1
This restriction does not sound so bad to me if this only refers to changes to 
the blocksize of a particular ZFS filesystem or copying between different ZFSes 
in the same pool. This can properly be managed with a -f switch on the 
userlan app to force the copy when it would fail.

any of f1..f5's last blocks are partial
Does this mean that f1,f2,f3,f4 needs to be exact multiplum of the ZFS 
blocksize? This is a severe restriction that will fail unless in very special 
cases.
Is this related to the disk format or is it restriction in the implrmentation? 
(do you know where to look in the source code?).

...but also ZFS most likely could not do any better with any other, more
specific non-dedup solution
Properly lots of I/O traffic, digest calculation+lookups, could be saved as we 
already know it will be a duplicate.
(In our case the files are gigabyte sizes)

--Per
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
Btw. I would be surprised to hear that this can be implemented
with current APIs;
I agree. However it looks like an opportunity to dive into the Z-source code.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Nicolas Williams
On Thu, Dec 03, 2009 at 12:44:16PM -0800, Per Baatrup wrote:
 if any of f2..f5 have different block sizes from f1
 
 This restriction does not sound so bad to me if this only refers to
 changes to the blocksize of a particular ZFS filesystem or copying
 between different ZFSes in the same pool. This can properly be managed
 with a -f switch on the userlan app to force the copy when it would
 fail.

Why expose such details?

If you have dedup on and if the file blocks and sizes align then

cat f1 f2 f3 f4 f5  f6

will do the right thing and consume only space for new metadata.

If the file blocks and sizes do not align then

cat f1 f2 f3 f4 f5  f6

will still work correctly.

Or do you mean that you want a way to do that cat ONLY if it would
consume no new space for data?  (That might actually be a good
justification for a ZFS cat command, though I think, too, that one could
script it.)

 any of f1..f5's last blocks are partial
 
 Does this mean that f1,f2,f3,f4 needs to be exact multiplum of the ZFS
 blocksize? This is a severe restriction that will fail unless in very
 special cases.

Say f1 is 1MB, f2 is 128KB, f3 is 510 bytes, f4 is 514 bytes, and f5 is
10MB, and the recordsize for their containing datasets is 128KB, then
the new file will consume 10MB + 128KB more than f1..f5 did, but 1MB +
128KB will be de-duplicated.

This is not really a severe restriction.  To make ZFS do better than
that would require much extra metadata and complexity in the filesystem
that users who don't need to do space-efficient file concatenation (most
users, that is) won't want to pay for.

 Is this related to the disk format or is it restriction in the
 implrmentation? (do you know where to look in the source code?).

Both.

 ...but also ZFS most likely could not do any better with any other, more
 specific non-dedup solution
 
 Properly lots of I/O traffic, digest calculation+lookups, could be
 saved as we already know it will be a duplicate.  (In our case the
 files are gigabyte sizes)

ZFS hashes, and records hashes of blocks, not sub-blocks.  Look at my
above example.  To efficiently dedup the concatenation of the 10MB of f5
would require being able to have something like sub-block pointers.
Alternatively, if you want a concatenation-specific feature ZFS would
have to have a metadata notion of concatentation, but then the Unix way
of concatenating files couldn't be used for this since the necessary
context is lost in the I/O redirection.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Daniel Carosone
  Isn't this only true if the file sizes are such that the concatenated 
  blocks are perfectly aligned on the same zfs block boundaries they used 
  before?  This seems unlikely to me.
 
 Yes that would be the case.

While eagerly awaiting b128 to appear in IPS, I have been giving this issue 
(block size and alignment vs dedup) some thought recently.  I have a different, 
but sufficiently similar, scenario where the effectiveness of dedup will depend 
heavily on this factor.

For this case, though, the alignment question for short tails is relatively 
easily dealt with.  The key is that the record size of the file is up to 128k 
and may be shorter depending on various circumstances, such as the write 
pattern used.

To simplify, let us assume that the original files were all written quickly and 
sequentially, that is that they have n 128k blocks, plus a shorter tail.   When 
concatenating them, it should be sufficient to write out the target file in 
128k chunks from the source, then the first tail, then issue an fsync before 
moving on to the chunks from the second file.  

If the source files were not written in this pattern (e.g. log files, 
accumulating small varying-size writes), the best thing to do is to rewrite 
those in place as well, with the same pattern as being written to the joined 
file.  This can also have an improvement on compression efficiency, by allowing 
larger block sizes than the original.

Issues/questions:
 * This is an optimistic method of alignment, is there any mechanism to get 
stronger results - ie, to know the size of each record of the original, or to 
produce specific record size/alignment on output?
 * There's already the very useful seek interface for finding holes and data, 
perhaps something similar is useful here. Or a direct io related option to 
read, that can return short reads only up to the end of the current record?
 * Perhaps a pause of some kind (to wait for the txg to close) is also 
necessary, to ensure the tail doesn't get combined with new data and reblocked?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Quota information from nfs mounting linux client

2009-12-03 Thread Len Zaifman
We are using zfs (solaris 10u9) to serve disk to a couple of hundred linux 
clients via nfs. We would like users on the linux clients to be able to monitor 
their disk space on the zfs file system. They do not have shell.  accounts on 
the fileserver. Is the quota information on the fileserver (user and group) 
available to be read by a user program without priveleged access on a remote 
host (the linux client). Where would documentation be?

Thanks.

Sent from my BlackBerry device

This e-mail may contain confidential, personal and/or health 
information(information which may be subject to legal restrictions on use, 
retention and/or disclosure) for the sole use of the intended recipient. Any 
review or distribution by anyone other than the person for whom it was 
originally intended is strictly prohibited. If you have received this e-mail in 
error, please contact the sender and delete all copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import - device names not always updated?

2009-12-03 Thread Ragnar Sundblad

Thank you Cindy for your reply!

On 3 dec 2009, at 18.35, Cindy Swearingen wrote:

 A bug might exist but you are building a pool based on the ZFS
 volumes that are created in another pool. This configuration
 is not supported and possible deadlocks can occur.

I had absolutely no idea that ZFS volumes weren't supported
as ZFS containers. Were can I find information about what
is and what isn't supported for ZFS volumes?

 If you can retry this example without building a pool on another
 pool, like using files to create a pool and can reproduce this,
 then please let me know.

I retried it with files instead, and it then worked exactly
as expected. (Also, it didn't anymore magically remember
locations of earlier found volumes in other directories for
import, with or without the sleeps.)

I don't know if it is of interest, to anyone, but I'll
include the reworked file based test below.

/ragge


#!/bin/bash

set -e
set -x
mkdir /d
mkfile 1g /d/f1
mkfile 1g /d/f2
zpool create pool mirror /d/f1 /d/f2
zpool status pool
zpool export pool
mkdir /d/subdir1
mkdir /d/subdir2
mv /d/f1 /d/subdir1/
mv /d/f2 /d/subdir2/
zpool import -d /d/subdir1
zpool import -d /d/subdir2
zpool import -d /d/subdir1 -d /d/subdir2 pool
zpool status pool
# cleanup - remove the # DELETEME_ part
# DELETEME_zpool destroy pool
# DELETEME_rm -rf /d


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread A Darren Dunham
On Thu, Dec 03, 2009 at 12:44:16PM -0800, Per Baatrup wrote:
 any of f1..f5's last blocks are partial
 Does this mean that f1,f2,f3,f4 needs to be exact multiplum of the ZFS
 blocksize? This is a severe restriction that will fail unless in very
 special cases.  Is this related to the disk format or is it
 restriction in the implrmentation? (do you know where to look in the
 source code?).

I'm sure it's related to the FS structure.  How do you find a particular
point in a file quickly?  You don't read up to that point, you want to
go to it directly.  To do so, you have to know how the file is indexed.
If every block contains the same amount of data, this is a simple math
equation.  If some blocks have more or less data, then you have to keep
track of them and their size.  I doubt ZFS has any space or ability to
include non-full blocks in the middle of a file.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in clusters

2009-12-03 Thread Erik Trimble

Robert Milkowski wrote:

Robert Milkowski wrote:

Robert Milkowski wrote:

Hi,

When deploying ZFS in cluster environment it would be nice to be 
able to have some SSDs as local drives (not on SAN) and when pool 
switches over to the other node zfs would pick up the node's local 
disk drives as L2ARC.


To better clarify what I mean lets assume there is a 2-node cluster 
with 1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now 
lets assume one zfs pool would be created on top of a lun exported 
from 2540. Now 4x local SSDs could be added as L2ARC but because 
they are not visible on a 2nd node when cluster does failover it 
should be able to pick up the ssd's which are local to the other node.


L2ARC doesn't contain any data which is critical to pool so it 
doesn't have to be shared between node. SLOG would be a whole 
different story and generally it wouldn't be possible. But L2ARC 
should be.





Perhaps a scenario like below should be allowed:

node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 
node-1-ssd4

node-1# zpool export mysql
node-2# zpool import mysql
node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 
node-2-ssd4



This is assuming that pool can be imported when some of its slog 
devices are not accessible.
That way the pool always would have some L2ARC/SSDs not accessible 
but would provide L2ARC cache on each node with local SSDs.


Actually it looks like it already works like that!
A pool imports with its cache device unavailable just fine.
Then I added another cache device. And I can still import it with the 
first one available but not the 2nd one.


zpool status complains of course but other than that it seems to be 
working fine.


Any thought?





Ooo.  That's a scenario I hadn't thought about.

Right now, I'm doing something similar on the cheap:   I have an iSCSI 
LUN (big ass SATA Raidz2) mounted on host A, and am using a spare 15k 
SAS drive locally as the L2ARC.  When I export it and import it to 
another host, with a identical disk in the same location (.e.g. c1t1d0), 
I've done a 'zpool remove/add', since they write different ZFS 
signatures on the cache drive.  Works like a champ.


Given that I want to use the same device location (e.g.  c1t1d0) on both 
hosts, is there a way I can somehow add both as cache devices, and have 
ZFS tell them apart by the ID signature?


That is, on Host A, I do this:

# zpool create tank iSCSI LUN cache c1t1d0
# zpool export tank

Then, on Host B, I'm currently doing:

# zpool import tank
# zpool remove tank c1t1d0
# zpool add tank cache c1t1d0


I'd obviously like to figure some way that I don't need to do the 'zpool 
add/remove'


Robert's idea looks great, but I'm assuming that all the SSD devices 
have different drive locations.  What I need is some way of telling ZFS 
to use a device X as a cache device, based on it's ZFS signature, rather 
than it's physical device location, as that location might (in the past) 
be used by another vdev.  


Theoretically, I'd like to do something like this:

hostA# zpool create tank iSCSI LUN
hostA# zpool add tank cache c1t1d0
hostA# zpool export tank

hostB# zpool import tank
hostB# zpool add tank cache c1t1d0


And from then on, I just import/export between the two hosts, and it 
auto-picks the correct c1t1d0 drive.




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] possible mega_sas issue sol10u8 (Re: Workaround for mpt timeouts in snv_127)

2009-12-03 Thread James C. McPherson

Tru Huynh wrote:

follow up, another crash today.

On Mon, Nov 30, 2009 at 11:35:07AM +0100, Tru Huynh wrote:

1) OS
SunOS xargos.bis.pasteur.fr 5.10 Generic_141445-09 i86pc i386 i86pc


You should be logging a support call for this issue.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2009-12-03 Thread steven
It will work in a standard 8x or 16x slot. The bracket is backward. Not one for 
subtlety, I took the bracket off, grabbed some pliers, and reversed all the 
bends. Not exactly ideal... but I was then able to get it in the case and get 
some screw tension on it to hold it snugly to the case.

I had some problems with getting the card to initialize at first. One MB would 
simply not allow me to run the card in the x16 slot, even with onboard video, 
even with a generic pci video card.

Another motherboard I had, an asus-- don't recall the model, would allow it to 
work. I am using an old Geforce2 PCI card for video.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2009-12-03 Thread Tim Cook
On Thu, Dec 3, 2009 at 8:02 PM, steven steverun...@gmail.com wrote:

 It will work in a standard 8x or 16x slot. The bracket is backward. Not one
 for subtlety, I took the bracket off, grabbed some pliers, and reversed all
 the bends. Not exactly ideal... but I was then able to get it in the case
 and get some screw tension on it to hold it snugly to the case.

 I had some problems with getting the card to initialize at first. One MB
 would simply not allow me to run the card in the x16 slot, even with onboard
 video, even with a generic pci video card.

 Another motherboard I had, an asus-- don't recall the model, would allow it
 to work. I am using an old Geforce2 PCI card for video.


I recently picked up a pair of Intel SASUC8I.  I was able to flash them with
the LSI IT firmware for the 3081, and they appear to work just fine.  I
haven't done extensive testing, but booting off a livecd, it sees the disks
just fine, and loads a driver for them.

-- 
--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] b128a available w/deduplication

2009-12-03 Thread Richard Elling

FYI,
OpenSolaris b128a is available for download or image-update from the
dev repository.  Enjoy.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] b128a available w/deduplication

2009-12-03 Thread Dennis Clarke

 FYI,
 OpenSolaris b128a is available for download or image-update from the
 dev repository.  Enjoy.

I thought that dedupe has been out for weeks now ?

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] b128a available w/deduplication

2009-12-03 Thread James C. McPherson

Dennis Clarke wrote:

FYI,
OpenSolaris b128a is available for download or image-update from the
dev repository.  Enjoy.


I thought that dedupe has been out for weeks now ?


The source has, yes. But what Richard was referring to was the
respun build now available via IPS.


cheers,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] b128a available w/deduplication

2009-12-03 Thread Eric D. Mudama

On Fri, Dec  4 at  1:12, Dennis Clarke wrote:



FYI,
OpenSolaris b128a is available for download or image-update from the
dev repository.  Enjoy.


I thought that dedupe has been out for weeks now ?


Dedupe has been out, but there were some accounting issues scheduled
to be fixed in 128.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL corrupt, not recoverable even with logfix

2009-12-03 Thread James Risner
It was created on AMD64 FreeBSD with 8.0RC2 (which was version 13 of ZFS iirc.)

At some point I knocked it out (export) somehow, I don't remember doing so 
intentionally.  So I can't do commands like zpool replace since there are no 
pools.

It says it was last used by the FreeBSD box, but the FreeBSD does not show it  
with zpool status command.

I'm going down tomorrow to work on it again, and I'm going to try 8.0 Release 
AMD64 FreeBSD (I've already tried i386 AMD64 FreeBSD 8.0 Release) and 
Opensolaris dev-127.

I was just hoping there was some way I'm missing to mount it read only (I have 
tried zpool import -f -o readonly=yes but that doesn't work either.)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-03 Thread Chad Cantwell
I eventually performed a few more tests, adjusting some zfs tuning options 
which had no effect, and trying the
itmpt driver which someone had said would work, and regardless my system would 
always freeze quite rapidly in
snv 127 and 128a.  Just to double check my hardware, I went back to the 
opensolaris 2009.06 release version, and
everything is working fine.  The system has been running a few hours and copied 
a lot of data and not had any
trouble, mpt syslog events, or iostat errors.

One thing I found interesting, and I don't know if it's significant or not, is 
that under the recent builds and
under 2009.06, I had run echo '::interrupts' | mdb -k to check the interrupts 
used.  (I don't have the printout
handy for snv 127+, though).

I have a dual port gigabit Intel 1000 P PCI-e card, which shows up as e1000g0 
and e1000g1.  In snv 127+, each of
my e1000g devices shares an IRQ with my mpt devices (mpt0, mpt1) on the IRQ 
listing, whereas in opensolaris
2009.06, all 4 devices are on different IRQs.  I don't know if this is 
significant, but most of my testing when
I encountered errors was data transfer via the network, so it could have 
potentially been interfering with the
mpt drivers when it was on the same IRQ.  The errors did seem to be less 
frequent when the server I was copying
from was linked at 100 instead of 1000 (one of my tests), but that is as likely 
to be a result of the slower zpool
throughput as it is to be related to the network traffic.

I'll probably stay with 2009.06 for now since it works fine for me, but I can 
try a newer build again once some
more progress is made in this area and people want to see if its fixed (this 
machine is mainly to backup another
array so it's not too big a deal to test later when the mpt drivers are looking 
better and wipe again in the event
of problems)

Chad

On Tue, Dec 01, 2009 at 03:06:31PM -0800, Chad Cantwell wrote:
 To update everyone, I did a complete zfs scrub, and it it generated no errors 
 in iostat, and I have 4.8T of
 data on the filesystem so it was a fairly lengthy test.  The machine also has 
 exhibited no evidence of
 instability.  If I were to start copying a lot of data to the filesystem 
 again though, I'm sure it would
 generate errors and crash again.
 
 Chad
 
 
 On Tue, Dec 01, 2009 at 12:29:16AM -0800, Chad Cantwell wrote:
  Well, ok, the msi=0 thing didn't help after all.  A few minutes after my 
  last message a few errors showed
  up in iostat, and then in a few minutes more the machine was locked up 
  hard...  Maybe I will try just
  doing a scrub instead of my rsync process and see how that does.
  
  Chad
  
  
  On Tue, Dec 01, 2009 at 12:13:36AM -0800, Chad Cantwell wrote:
   I don't think the hardware has any problems, it only started having 
   errors when I upgraded OpenSolaris.
   It's still working fine again now after a reboot.  Actually, I reread one 
   of your earlier messages,
   and I didn't realize at first when you said non-Sun JBOD that this 
   didn't apply to me (in regards to
   the msi=0 fix) because I didn't realize JBOD was shorthand for an 
   external expander device.  Since
   I'm just using baremetal, and passive backplanes, I think the msi=0 fix 
   should apply to me based on
   what you wrote earlier, anyway I've put 
 set mpt:mpt_enable_msi = 0
   now in /etc/system and rebooted as it was suggested earlier.  I've 
   resumed my rsync, and so far there
   have been no errors, but it's only been 20 minutes or so.  I should have 
   a good idea by tomorrow if this
   definitely fixed the problem (since even when the machine was not 
   crashing it was tallying up iostat errors
   fairly rapidly)
   
   Thanks again for your help.  Sorry for wasting your time if the 
   previously posted workaround fixes things.
   I'll let you know tomorrow either way.
   
   Chad
   
   On Tue, Dec 01, 2009 at 05:57:28PM +1000, James C. McPherson wrote:
Chad Cantwell wrote:
After another crash I checked the syslog and there were some different 
errors than the ones
I saw previously during operation:
...

Nov 30 20:59:13 the-vault   LSI PCI device (1000,) not 
supported.
...
Nov 30 20:59:13 the-vault   mpt_config_space_init failed
...
Nov 30 20:59:15 the-vault   mpt_restart_ioc failed


Nov 30 21:33:02 the-vault fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
PCIEX-8000-8R, TYPE: Fault, VER: 1, SEVERITY: Major
Nov 30 21:33:02 the-vault EVENT-TIME: Mon Nov 30 21:33:02 PST 2009
Nov 30 21:33:02 the-vault PLATFORM: System-Product-Name, CSN: 
System-Serial-Number, HOSTNAME: the-vault
Nov 30 21:33:02 the-vault SOURCE: eft, REV: 1.16
Nov 30 21:33:02 the-vault EVENT-ID: 
7886cc0d-4760-60b2-e06a-8158c3334f63
Nov 30 21:33:02 the-vault DESC: The transmitting device sent an 
invalid request.
Nov 30 21:33:02 the-vault   Refer to http://sun.com/msg/PCIEX-8000-8R 
for more information.
Nov 30 

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Michael Schuster

Nicolas Williams wrote:

On Thu, Dec 03, 2009 at 12:44:16PM -0800, Per Baatrup wrote:

if any of f2..f5 have different block sizes from f1

This restriction does not sound so bad to me if this only refers to
changes to the blocksize of a particular ZFS filesystem or copying
between different ZFSes in the same pool. This can properly be managed
with a -f switch on the userlan app to force the copy when it would
fail.


Why expose such details?

If you have dedup on and if the file blocks and sizes align then

cat f1 f2 f3 f4 f5  f6

will do the right thing and consume only space for new metadata.


I think Per's concern was not only with space consumed but also the effort 
involved in the process (think large files); if I read his emails 
correctly, he'd like what amounts to manipulation of meta-data only to have 
the data blocks of what was originally 5 files to end up in one file; the 
traditional concat operation will cause all the data to be read and written 
back, at which point dedup will kick in, and so most of the processing has 
already been spent. (Per, please correct/comment)


Michael
--
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] b128a available w/deduplication

2009-12-03 Thread Dennis Clarke

 Dennis Clarke wrote:
 FYI,
 OpenSolaris b128a is available for download or image-update from the
 dev repository.  Enjoy.

 I thought that dedupe has been out for weeks now ?

 The source has, yes. But what Richard was referring to was the
 respun build now available via IPS.

Oh, sorry. Thought I had missed something. I hadn't :-)

I'm not on version 22 for ZFS and am not even entirely sure what that is :


# uname -a
SunOS europa 5.11 snv_129 sun4u sparc SUNW,UltraAX-i2

# zpool upgrade -v
This system is currently running ZFS pool version 22.

The following versions are supported:

VER  DESCRIPTION
---  
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
 11  Improved scrub performance
 12  Snapshot properties
 13  snapused property
 14  passthrough-x aclinherit
 15  user/group space accounting
 16  stmf property support
 17  Triple-parity RAID-Z
 18  Snapshot user holds
 19  Log device removal
 20  Compression using zle (zero-length encoding)
 21  Deduplication
 22  Received properties

For more information on a particular version, including supported
releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.

HOWEVER, that URL no longer works for N  19 and in fact, the entire URL
has changed to :

http://hub.opensolaris.org/bin/view/Community+Group+zfs/22



-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSDs with a SCSI SCA interface?

2009-12-03 Thread Erik Trimble

Hey folks.

I've looked around quite a bit, and I can't find something like this:

I have a bunch of older systems which use Ultra320 SCA hot-swap 
connectors for their internal drives. (e.g. v20z and similar)


I'd love to be able to use modern flash SSDs with these systems, but I 
have yet to find someone who makes anything that would fit the bill.


I need either:

(a) a SSD with an Ultra160/320 parallel interface (I can always find an 
interface adapter, so I'm not particular about whether it's a 68-pin or SCA)


(b)  a  SAS or SATA to UltraSCSI adapter (preferably with a SCA interface)



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss