Well, for the sake of completeness (and perhaps to enable users of snv_151a)
there should also be links to alternative methods:
1) Using a patched-source and recompiled, or an already precompiled, zpool
binary, i.e.
I've hit the problem myself recently, and mounting the filesystem cleared
something in the brains of ZFS and alowed me to snapshot.
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg00812.html
PS: I'll use Google before asking some questions, a'la (C) Bart Simpson
That's how I found
It's good he didn't mail you, now we all know some under-the-hood details via
Googling ;)
Thanks to both of you for this :)
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
I've installed SXDE (snv_89) and found that the web console only listens on
https://localhost:6789/ now, and the module for ZFS admin doesn't work.
When I open the link, the left frame lists a stacktrace (below) and the right
frame is plain empty. Any suggestions?
I tried substituting
We have a test machine installed with a ZFS root (snv_77/x86 and
rootpol/rootfs with grub support).
Recently tried to update it to snv_89 which (in Flag Days list) claimed more
support for ZFS boot roots, but the installer disk didn't find any previously
installed operating system to upgrade.
You mean this:
https://www.opensolaris.org/jive/thread.jspa?threadID=46626tstart=120
Elegant script, I like it, thanks :)
Trying now...
Some patching follows:
-for fs in `zfs list -H | grep ^$ROOTPOOL/$ROOTFS | awk '{ print $1 };'`
+for fs in `zfs list -H | grep ^$ROOTPOOL/$ROOTFS | grep -w
Alas, didn't work so far.
Can the problem be that the zfs-root disk is not the first on the controller
(system boots from the grub on the older ufs-root slice), and/or that zfs is
mirrored? And that I have snapshots and a data pool too?
These are the boot disks (SVM mirror with ufs and grub):
No, I did not set that property; not now, not in previous releases.
Nice to see secure by default coming to the admin tools as well.
Waiting for SSH to become 127.0.0.1:22 sometime... just kidding ;)
Thanks for the tip!
Any ideas about the stacktrace? - it's still there instead of the web-GUI
I checked - this system has a UFS root. When installed as snv_84 and then LU'd
to snv_89, and when I fiddled with these packages from various other releases,
it had the stacktrace instead of the ZFS admin GUI (or the well-known
smcwebserver restart effect for the older packages).
This system
Likewise. Just plain doesn't work.
Not required though, since the command-line is okay and way powerful ;)
And there are some more interesting challenges to work on, so I didn't push
this problem any more yet.
This message posted from opensolaris.org
Interesting, we'll try that.
Our server with the problem has been boxed now, so I'll check the solution when
it gets on site.
Thanks ahead, anyway ;)
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
Just my 2c: Is it possible to do an offline dedup, kind of like snapshotting?
What I mean in practice, is: we make many Solaris full-root zones. They share a
lot of data as complete files. This is kind of easy to save space - make one
zone as a template, snapshot/clone its dataset, make new
Ok, thank you Nils, Wade for the concise replies.
After much reading I agree that the ZFS-development queued features do deserve
a higher ranking on the priority list (pool-shrinking/disk-removal and
user/group quotas would be my favourites), so probably the deduplication tool
I'd need would,
Is it possible to create a (degraded) zpool with placeholders specified instead
of actual disks (parity or mirrors)? This is possible in linux mdadm (missing
keyword), so I kinda hoped this can be done in Solaris, but didn't manage to.
Usecase scenario:
I have a single server (or home
For the sake of curiosity, is it safe to have components of two different ZFS
pools on the same drive, with and without HDD write cache turned on?
How will ZFS itself behave, would it turn on the disk cache if the two imported
pools co-own the drive?
An example is a multi-disk system like mine
Thanks Tomas, I haven't checked yet, but your workaround seems feasible.
I've posted an RFE and referenced your approach as a workaround.
That's nearly what zpool should do under the hood, and perhaps can be done
temporarily with a wrapper script to detect min(physical storage sizes) ;)
//Jim
Thanks to all those who helped, even despite the non-enterprise approach of
this question ;)
While experimenting I discovered that Solaris /tmp doesn't seem to support
sparse files: mkfile -n still creates full-sized files which can either use
up the
swap space, or not fit there. ZFS and UFS
...and, apparently, I can replace two drives at the similar time (in two
commands), and resilvering goes in parallel:
{code}
[r...@t2k1 /]# zpool status pool
pool: pool
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue
)?
//Thanks in advance, we're expecting a busy weekend ;(
//Jim Klimov
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
zpool history has shed a little light. Lots actually.
The sub-dataset in question was indeed created, and at the time ludelete was run
there are some entries along the lines of zfs destroy -r pond/zones/zonename.
There's no precise details (names, mountpoints) about the destroyed datasets -
and I
Hello Mark, Darren,
Thank you guys for suggesting zpool history, upon which we stumbled before
receiving your comments. Nonetheless, the history results are posted above.
Still no luck trying to dig out the dataset data, so far.
As I get it, there are no (recent) backups which is a poor
to be renamed.
Hope this helps, let us know if it does ;)
//Jim Klimov
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I meant to add that due to the sheer amount of data (and time needed) to copy,
you really don't want to use copying tools which abort on error, such as MS
Explorer.
Normally I'd suggest something like FAR in Windows or Midnight Commander in Unix
to copy over networked connections (CIFS shares),
True, correction accepted, covering my head with ashes in shame ;)
We do use a custom-built package of rsync-3.0.5 with a number of their standard
contributed patches applied. To be specific, these:
checksum-reading.diff
checksum-updating.diff
detect-renamed.diff
downdate.diff
fileflags.diff
Do you have any older benchmarks on these cards and arrays (in their pre-ZFS
life?) Perhaps this is not a ZFS regression but a hardware config issue?
Perhaps there's some caching (like per-disk write-through) not enabled on the
arrays? As you may know, the ability (and reliability) of such
Hmm, scratch that. Maybe.
I did not first get the point that your writes to a filesystem dataset work
quickly.
Perhaps filesystem is (better) cached indeed, i.e. *maybe* zvol writes are
synchronous and zfs writes may be cached and thus async? Try playing around
with relevant dataset
Probably better use zfs recv -nFvd first (no-write verbose mode) to be
certain
about your write-targets and about overwriting stuff (i.e. zfs recv -F would
destroy any newer snapshots, if any - so you can first check which ones, and
possibly clone/rename them first).
// HTH, Jim Klimov
You might also want to force ZFS into accepting a faulty root pool:
# zpool set failmode=continue rpool
//Jim
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
I installed opensolaris and setup rpool as my base install on a single 1TB
drive
If I understand correctly, you have rpool and the data pool configured all as
one
pool?
That's not probably what you'd really want. For one part, the bootable root pool
should all be available to GRUB from a
One more note,
For example, if you were to remake the pool (as suggested above for rpool and
below for raidz data pool) - where would you re-get the original data for
copying
over again?
Of course, if you take on with the idea of buying 4 drives and building a
raidz1 vdev
right away, and
After reading many-many threads on ZFS performance today (top of the list in
the
forum, and some chains of references), I applied a bit of tuning to the server.
In particular, I've set the zfs_write_limit_override to 384Mb so my cache is
spooled
to disks more frequently (if streaming lots of
Trying to spare myself the expense as this is my home system so budget is
a constraint.
What I am trying to avoid is having multiple raidz's because every time I
have another one I loose a lot of extra space to parity. Much like in raid 5.
There's a common perception which I tend to share
You might also search for OpenSolaris NAS projects. Some that I've seen
previously
involve nearly the same config you're building - a CF card or USB stick with
the OS
and a number of HDDs in a zfs pool for the data only.
I am not certain which ones I've seen, but you can look for EON, and
I did a zpool scrub recently, and while it was running it reported errors and
woed
about restoring from backup. When the scrub is complete, it reports finishing
with
0 errors though. On the next scrub some other errors are reported in different
files.
iostat -xne does report a few errors (1
Hello tobex,
While the original question may have been answered by posts above, I'm
interested:
when you say according to zfs list the zvol is 100% full, does it only mean
that it
uses all 20Gb on the pool (like a non-sparse uncompressed file), or does it
also imply
that you can't write into
If I understand you right it is as you said.
Here's an example and you can see what happened.
The sam-fs is filled to only 6% and the zvol ist full.
I'm afraid I was not clear with my question, so I'd elaborate, then.
It remains standing as: during this situation, can you write new data into
Concerning the reservations, here's a snip from man zfs:
The reservation is kept equal to the volume's logical
size to prevent unexpected behavior for consumers.
Without the reservation, the volume could run out of
space, resulting in undefined
Hello all.
Like many others, I've come close to making a home NAS server based on
ZFS and OpenSolaris. While this is not an enterprise solution with high IOPS
expectation, but rather a low-power system for storing everything I have,
I plan on cramming in some 6-10 5400RPM Green drives with low
Thanks for the link, but the main concern in spinning down drives of a ZFS pool
is that ZFS by default is not so idle. Every 5 to 30 seconds it closes a
transaction
group (TXG) which requires a synchronous write of metadata to disk.
I mentioned reading many blogs/forums on the matter, and some
Hello all
Sorry for bumping an old thread, but now that snv_128 is due to appear as a
public DVD download, I wonder: has this fix for zfs-accounting and other issues
with zfs dedup been integrated into build 128?
We have a fileserver which is likely to have much redundant data and we'd like
Hi all
I wonder if there has been any new development on this matter over the past 6
months.
Today i pondered an idea of zfs-aware mv, capable of doing zero read/write of
file data when moving files between datasets of one pool.
This seems like a (z)cp idea proposed in this thread and seems
Hi all
I wonder if there has been any new development on this matter over the past 6
months.
Today i pondered an idea of zfs-aware mv, capable of doing zero read/write of
file data when moving files between datasets of one pool.
This seems like a (z)cp idea proposed in this thread and seems
Well, as I wrote in other threads - i have a pool named pool on physical
disks, and a compressed volume in this pool which i loopback-mount over iSCSI
to make another pool named dcpool.
When files in dcpool are deleted, blocks are not zeroed out by current ZFS
and they are still allocated for
In a recent post r-mexico wrote that they had to parse system messages and
manually fail the drives on a similar, though different, occasion:
http://opensolaris.org/jive/message.jspa?messageID=515815#515815
--
This message posted from opensolaris.org
Technically bootfs ID is a string which names the root dataset, typically
rpool/ROOT/solarisReleaseNameCode. This string can be passed to Solaris
kernel as a parameter manually or by bootloader, otherwise a default current
bootfs is read from the root pool's attributes (not dataset attributes!
You can try to workaround - no idea if this would really work -
0) Disable stmf and iscsi/* services
1) Create your volume's clone
2) Rename the original live volume dataset to some other name
3) Rename the clone to original dataset's name
4) Promote the clone
- now for the system it SHOULD seem
come up with an idea of a dtrace for your situation.
I have little non-zero hope that the experts would also come to the web-forums
and review the past month's posts and give their comments to my, your and
others' questions and findings ;)
//Jim Klimov
--
This message posted from opensolaris.org
Sorry, I did not hit this type of error...
AFAIK the pool writes during zfs receive are done by current code (i.e. ZFSv22
for you) based on data read from the backup stream. So unless there are
corruptions on the pool which happened to be at the same time as you did your
restore, this
Sorry, I guess I'm running out of reasonable ideas then.
One that you can try (or already did) is installing Solaris not by JumpStart or
WANBoot but from original media (DVD or Network Install) to see if the problem
pertains. Maybe your flash image lacks some controller drivers, etc? (I am not
different mirrors), or RAIDZ1 when we need more space available.
HTH,
//Jim Klimov
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Ah, yes, regarding the backdoor to root fs: if you select to have some
not-quoata'ed space hogs in the same pool as your root FS, you can look into
setting a reservation (and/or refreservation) for the root FS datasets. For
example, if your OS installation uses 4Gb and you don't think it would
The thing is- as far as I know the OS doesn't ask the disk to find a place
to fit the data. Instead the OS tracks what space on the disk is free and
then tells the disk where to write the data.
Yes and no, I did not formulate my idea clearly enough, sorry for confusion ;)
Yes - The disks
or test if the theoretical warnings are valid?
Thanks,
//Jim Klimov
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Small world... Never seen this problem before your post, and hit it now myself
;)
We had an outage on an SXCE snv_117 server today with a data pool taking
unknown time to import, so we decided to zpool import -F it. But the feature
is lacking in build 117, so we imported the pool into an
Well, if this is not a root disk and the server boots at least to single-user,
as you wrote above, you can try to disable auto-import of this pool.
Easiest of all is to disable auto-imports of all pools by removing or renaming
the file /etc/zfs/zpool.cache - it is a list of known pools for
to be torn down and
remade with another layout, which would require lots of downtime
and extra space for backups ;)
--
++
||
| Климов Евгений, Jim
substantial) - now I'd get rid of this experiment much faster ;)
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор
, Jim Klimov |
| технический директор CTO |
| ЗАО ЦОС и ВТ JSC COSHT |
||
| +7-903-7705859 (cellular) mailto:jimkli...@cos.ru
many people suggest that a backup on another similar server box is superior to
using tape backups - although probably using more electricity in real-time).
sorry if this goes in the wrong spot i could no find
Seems to have come correctly ;)
HTH,
//Jim Klimov
--
This message posted from
So if you bump this to 32k then the fragmented size
is 512k which tells ZFS to switch to a different metaslab
once it drops below this threshold.
Makes sense after some more reading today ;)
What happens if no metaslab has a block this large (or small)
on a sufficiently full and fragmented
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор CTO |
| ЗАО ЦОС и ВТ JSC COSHT |
||
| +7-903-7705859 (cellular
2011-05-19 17:00, Jim Klimov пишет:
I am not sure you can monitor actual mechanical seeks short
of debugging and interrogating the HDD firmware - because
it is the last responsible logic in the chain of caching,
queuing and issuing actual commands to the disk heads.
For example, a long logical
Just a random thought: if two devices have same IDs and seem to work in
turns,
are you certain you have a mirror and not two paths to the same backend?
A few years back I was given to support a box with sporadically failing
drives
which turned out to be two paths to the same external array,
: 90%3342 MB (p)
Most Frequently Used Cache Size: 9%362 MB (c-p)
arc_meta_used = 2617 MB
arc_meta_limit= 6144 MB
arc_meta_max = 4787 MB
Thanks for any insights,
//Jim Klimov
IP
addresses (i.e. localhost and NIC IP) - but that would probably fail
at the same bottleneck moment - or to connect to the zvol/rdsk/...
directly, without iSCSI?
Thanks for ideas,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss
.
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор CTO |
| ЗАО ЦОС и ВТ
it ;)
--
++
||
| ?? ???, Jim Klimov |
| ??? CTO |
| ??? ??? ? ?? JSC COSHT
thought about it, can't get
rid of the idea ;) ...
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор
://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор
Dan ... It would still need a complex bp_rewrite.
Are you certain about that?
For example, scrubbing/resilvering and fixing corrupt blocks with
non-matching checksums is a post-processing operation which
works on an existing pool and rewrites some blocks if needed.
And it works without a
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Frank Van Damme
Op 26-05-11 13:38, Edward Ned Harvey schreef:
But what if you loose it (the vdev), would there be a way to
reconstruct the DDT (which you need to be able to delete old,
of HCL HDDs all have one connector...
Still, I gess my post poses mre questions than answers, but maybe some other
list readers can reply...
Hint: Nexenta people seem to be good OEM friends with Supermicro, so they
might know ;)
HTH,
//Jim Klimov
know ;)
Yes :-)
-- richard
Thanks!
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Tim Cook wrote:
SAS drives are SAS drives, they aren't like SCSI.
There aren't 20 different versions with different pinouts.
Uh-huh... Reading some more articles, I think I found the
answer to my question: the SAS connector seems to be
dual-sided (with conductive stripes on both sides of the
.
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор CTO |
| ЗАО ЦОС и ВТ JSC COSHT
negligible and there
are more options quickly available, such as mounting the iSCSI
device on another server? Now that I hit the problem of reverting
to direct volume access, this makes sense ;)
Thanks in advance for ideas or clarifications,
//Jim Klimov
4295GB 4295GB 8389kB
But lofiadm doesn't let me address that partition #1 as a separate device :(
Thanks,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs
of
3*4-disk-raidz1 vs 1*12-disk raidz3, so which
of the tradeoffs is better - more vdevs or more
parity to survive loss of ANY 3 disks vs. right
3 disks?
Thanks,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http
?
cheers
Matt
On 05/27/11 13:43, Jim Klimov wrote:
Did you try it as a single command, somewhat like:
zpool create -R /a -o cachefile=/a/etc/zfs/zpool.cache mypool c3d0
Using altroots and cachefile(=none) explicitly is a nearly-
documented way to avoid caching pools which you
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор CTO |
| ЗАО ЦОС и ВТ JSC COSHT |
||
| +7-903-7705859 (cellular
Actually if you need beadm to know about the data pool,
it might be beneficial to mix both approaches - yours with
bemount, and init-script to enforce the pool import on that
first boot...
HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор CTO |
| ЗАО ЦОС и ВТ JSC COSHT
dedicated tasks with data you're okay with losing.
You can also make the rpool a three-way mirror which may increase
read speeds if you have enough concurrentcy. And when one drive
breaks, your rpool is still mirrored.
HTH,
//Jim Klimov
___
zfs-discuss
If it is powered on, then it is a warm spare :-)
Warm spares are a good idea. For some platforms, you can
spin down the
disk so it doesn't waste energy.
But I should note that we've had issues with a hot spare disk added to rpool
in particular, preventing boots on Solaris 10u8. It turned
Disk /dev/zvol/rdsk/pool/dcpool: 4295GB
Sector size (logical/physical): 512B/512B
Just to check, did you already try:
zpool import -d /dev/zvol/rdsk/pool/ poolname
Thanks for the sugestion. As a matter of fact, I did not try that.
But it hasn't helped (possibly tue to partitioning
0 0 0
-- - - - - - -
Thanks,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
2011-06-02 18:40, Josh Simon ?:
I was just doing some storage research and came across this
http://www.nexenta.com/corp/images/stories/pdfs/hardware-supported.pdf. In
that document for Nexenta (an opensolaris variant) it states that you
should not use Intel X25-E SSDSA2SH032G1 SSD with a
(and/or use rsync to correct some misreceived
blocks if network was faulty).
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор
link. Took many retries, and zfs send is not strong at retrying ;)
--
++
||
| Климов Евгений, Jim Klimov |
| технический директор
of newer OpenIndianas (148b, 151, pkg-dev repository)
already?
Thanks for any comments, condolenscences, insights, bugfixes ;)
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
of a single full dump, the chance of a single corruption
making your (latest) backup useless would be also higher, right?
Thanks for clarifications,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman
2011-06-10 13:51, Jim Klimov пишет:
and the system dies in
swapping hell (scanrates for available pages were seen to go
into millions, CPU context switches reach 200-300k/sec on a
single dualcore P4) after eating the last stable-free 1-2Gb
of RAM within a minute. After this the system responds
the LUN with sbdadm, and importing the
dcpool are all wrapped in several SMF services
so I can relatively easily control the presence
of these pools (I can disable them from autostart
by touching a file in /etc directory).
Steve
- Jim Klimov jimkli...@cos.ru wrote:
I've captured
2011-06-10 20:58, Marty Scholes пишет:
If it is true that unlike ZFS itself, the replication
stream format has
no redundancy (even of ECC/CRC sort), how can it be
used for
long-term retention on tape?
It can't. I don't think it has been documented anywhere, but I believe that it
has been well
While looking over iostats from various programs, I see that
my OS HDD is busy writing, about 2Mb/sec stream all the time
(at least while the dcpool import/recovery attempts are
underway, but also now during a mere zdb walk).
According to iostat this load stands out greatly:
?
Or there is no coalescing and this is why? ;)
Thanks,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
but otherwise the system
should have remained remain responsive (tested
failmode=continue and failmode=wait on different
occasions).
So I can relate - these things happen, they do annoy,
and I hope they will be fixed sometime soon so that
ZFS matches its docs and promises ;)
//Jim Klimov
2011-06-11 19:16, Jim Mauro пишет:
Does this reveal anything;
dtrace -n 'syscall::*write:entry /fds[arg0].fi_fs == zfs/ {
@[execname,fds[arg0].fi_pathname]=count(); }'
Alas, not much.
# time dtrace -n 'syscall::*write:entry /fds[arg0].fi_fs == zfs/ {
2011-06-11 20:34, Jim Klimov пишет:
time dtrace -n 'syscall::*write:entry /fds[arg0].fi_fs == zfs/ {
@[execname,fds[arg0].fi_pathname]=count(); }'
This time I gave it more time, and used the system a bit -
this dtrace works indeed, but there are still too few file
accesses:
# time dtrace -n
2011-06-11 20:42, Jim Mauro пишет:
Well we may have missed something, because that dtrace will
only capture write(2) and pwrite(2) - whatever is generating the writes
may be using another interface (writev(2) for example).
What about taking it down a layer:
dtrace -n 'fsinfo:::write
1 - 100 of 540 matches
Mail list logo