didn't seem to we would need zfs to provide that redundancy also.
There was a time when I fell for this line of reasoning too. The problem (if
you want to call it that) with zfs is that it will show you, front and center,
the corruption taking place in your stack.
Since we're on SAN with
Lights. Good.
Agreed. In a fit of desperation and stupidity I once enumerated disks by
pulling them one by one from the array to see which zfs device faulted.
On a busy array it is hard even to use the leds as indicators.
It makes me wonder how large shops with thousands of spindles handle
Lights. Good.
Agreed. In a fit of desperation and stupidity I once enumerated disks by
pulling them one by one from the array to see which zfs device faulted.
On a busy array it is hard even to use the leds as indicators.
It makes me wonder how large shops with thousands of spindles handle
Funny you say that.
My Sun v40z connected a pair of Sun A5200 arrays running OSol 128a can't see
the enclosures. The luxadm command comes up blank.
Except for that annoyance (and similar other issues) the Sun gear works well
with a Sun operating system.
Sent from Yahoo! Mail on Android
Has there been any change to the server hardware with
respect to number of
drives since ZFS has come out? Many of the servers
around still have an even
number of drives (2, 4) etc. and it seems far from
optimal from a ZFS
standpoint. All you can do is make one or two
mirrors, or a 3 way
It sounds like you are getting a good plan together.
The only thing though I seem to remember reading that adding vdevs to
pools way after the creation of the pool and data had been written to it,
that things aren't spread evenly - is that right? So it might actually make
sense to buy all the
I am asssuming you will put all of the vdevs into a single pool, which is a
good idea unless you have a specific reason for keeping them separate, e.g. you
want to be able to destroy / rebuild a particular vdev while leaving the others
intact.
Fewer disks per vdev implies more vdevs, providing
Just for completeness, there is also VirtualBox which runs Solaris nicely.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
If it is true that unlike ZFS itself, the replication
stream format has
no redundancy (even of ECC/CRC sort), how can it be
used for
long-term retention on tape?
It can't. I don't think it has been documented anywhere, but I believe that it
has been well understood that if you don't trust
I stored a snapshot stream to a file
The tragic irony here is that the file was stored on a non-zfs filesystem. You
had had undetected bitrot which unknowingly corrupted the stream. Other files
also might have been silently corrupted as well.
You may have just made one of the strongest
This is not a true statement. If the primarycache
policy is set to the default, all data will
be cached in the ARC.
Richard, you know this stuff so well that I am hesitant to disagree with you.
At the same time, I have seen this myself, trying to load video files into
L2ARC without success.
I'll throw out some (possibly bad) ideas.
Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the
40GB of total reads, suggesting that the L2ARC doesn't add any value for this
test.
Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS
put data to
While I am by no means on expert on this, I went through a similar mental
exercise previously and came to the conclusion that in order to service a
particular read request, zfs may need to read more from the disk. For example,
a 16KB request in a stripe might need to retrieve the full 128KB
2011/5/26 Eugen Leitl eu...@leitl.org:
How bad would raidz2 do on mostly sequential writes
and reads
(Athlon64 single-core, 4 GByte RAM, FreeBSD 8.2)?
The best way is to go is striping mirrored pools,
right?
I'm worried about losing the two wrong drives out
of 8.
These are all
Richard wrote:
Untrue. The performance of a 21-disk raidz3 will be nowhere near the
performance of a 20 disk 2-way mirrror.
You know this stuff better than I do. Assuming no bus/cpu bottlenecks, a 21
disk raidz3 should provide sequential throughput of 18 disks and random
throughput of 1
Richard wrote:
Yep, it depends entirely on how you use the pool. As soon as you
come up with a credible model to predict that, then we can optimize
accordingly :-)
You say that somewhat tongue-in-cheek, but Edward's right. If the resliver
code progresses in
I've had a few people sending emails directly
suggesting it might have something to do with the
ZIL/SLOG. I guess I should have said that the issue
happen both ways, whether we copy TO or FROM the
Nexenta box.
You mentioned a second Nexenta box earlier. To rule out client-side issues,
Sorry, I can't not respond...
Edward Ned Harvey wrote:
whatever you do, *don't* configure one huge raidz3.
Peter, whatever you do, *don't* make a decision based on blanket
generalizations.
If you can afford mirrors, your risk is much lower.
Because although it's
hysically possible for 2
On Fri, Oct 15, 2010 at 3:16 PM, Marty Scholes
martyscho...@yahoo.com wrote:
My home server's main storage is a 22 (19 + 3) disk
RAIDZ3 pool backed up hourly to a 14 (11+3) RAIDZ3
backup pool.
How long does it take to resilver a disk in that
pool? And how long
does it take to run
Here are some more findings...
The Nexenta box has 3 pools:
syspool: made of 2 mirrored (hardware RAID) local SAS
disks
pool_sas: made of 22 15K SAS disks in ZFS mirrors on
2 JBODs on 2 controllers
pool_sata: made of 42 SATA disks in 6 RAIDZ2 vdevs on
a single controller
When we copy
I apologize if this has been covered before. I have not seen a blow-by-blow
installation guide for Ubuntu onto an iSCSI target.
The install guides I have seen assume that you can make a target visible to
all, which is a problem if you want multiple iSCSI installations on the same
COMSTAR
Have you had a lot of activity since the scrub started?
I have noticed what appears to be extra I/O at the end of a scrub when activity
took place during the scrub. It's as if the scrub estimator does not take the
extra activity into account.
--
This message posted from opensolaris.org
I think you are seeing ZFS store up the writes, coalesce them, then flush to
disk every 30 seconds.
Unless the writes are synchronous, the ZIL won't be used, but the writes will
be cached instead, then flushed.
If you think about it, this is far more sane than flushing to disk every time
the
Roy Sigurd Karlsbakk wrote:
device r/s w/s kr/s kw/s wait actv svc_t %w %b
cmdk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
cmdk1 0.0 163.6 0.0 20603.7 1.6 0.5 12.9 24 24
fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd1 0.5 140.3 0.3 2426.3 0.0 1.0 7.2 0 14
sd2 0.0
Is this a sector size issue?
I see two of the disks each doing the same amount of work in roughly half the
I/O operations each operation taking about twice the time compared to each of
the remaining six drives.
I know nothing about either drive, but I wonder if one type of drive has twice
the
Alexander Skwar wrote:
Okay. This contradicts the ZFS Best Practices Guide,
which states:
# For production environments, configure ZFS so that
# it can repair data inconsistencies. Use ZFS
redundancy,
# such as RAIDZ, RAIDZ-2, RAIDZ-3, mirror, or copies
1,
# regardless of the RAID level
David Dyer-Bennet wote:
Sure, if only a single thread is ever writing to the
disk store at a time.
This situation doesn't exist with any kind of
enterprise disk appliance,
though; there are always multiple users doing stuff.
Ok, I'll bite.
Your assertion seems to be that any kind of
Richard Elling wote:
Define fragmentation?
Maybe this is the wrong thread. I have noticed that an old pool can take 4
hours to scrub, with a large portion of the time reading from the pool disks at
the rate of 150+ MB/s but zpool iostat reports 2 MB/s read speed. My naive
interpretation is
Erik wrote:
Actually, your biggest bottleneck will be the IOPS
limits of the
drives. A 7200RPM SATA drive tops out at 100 IOPS.
Yup. That's it.
So, if you need to do 62.5e6 IOPS, and the rebuild
drive can do just 100
IOPS, that means you will finish (best case) in
62.5e4 seconds.
I am speaking from my own observations and nothing scientific such as reading
the code or designing the process.
A) Resilver = Defrag. True/false?
False
B) If I buy larger drives and resilver, does defrag
happen?
No. The first X sectors of the bigger drive are identical to the smaller
This paper is exactly what is needed -- giving an overview to a wide audience
of the ZFS fundamental components and benefits.
I found several grammar errors -- to be expected in a draft and I think at
least one technical error.
The paper seems to imply that multiple vdevs will induce striping
Is it currently or near future possible to shrink a
zpool remove a disk
As other's have noted, no, not until the mythical bp_rewrite() function is
introduced.
So far I have found no documentation on bp_rewrite(), other than it is the
solution to evacuating a vdev, restriping a vdev,
Hello,
I would like to backup my main zpool (originally
called data) inside an equally originally named
backupzpool, which will also holds other kinds of
backups.
Basically I'd like to end up with
backup/data
backup/data/dataset1
backup/data/dataset2
backup/otherthings/dataset1
Script attached.
Cheers,
Marty
--
This message posted from opensolaris.org
zfs_sync
Description: Binary data
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Erik Trimble wrote:
On 8/10/2010 9:57 PM, Peter Taps wrote:
Hi Eric,
Thank you for your help. At least one part is clear
now.
I still am confused about how the system is still
functional after one disk fails.
Consider my earlier example of 3 disks zpool
configured for raidz-1. To
Peter wrote:
One question though. Marty mentioned that raidz
parity is limited to 3. But in my experiment, it
seems I can get parity to any level.
You create a raidz zpool as:
# zpool create mypool raidzx disk1 diskk2
Here, x in raidzx is a numeric value indicating the
desired
ahh that explains it all, god damn that base 1000
standard , only usefull for sales people :)
As much as it all annoys me too, the SI prefixes are used correctly pretty much
everywhere except in operating systems.
A kilometer is not 1024 meters and a megawatt is not 1048576 watts.
Us, the IT
If the format utility is not displaying the WD drives
correctly,
then ZFS won't see them correctly either. You need to
find out why.
I would export this pool and recheck all of your
device connections.
I didn't see it in the postings, but are the same serial numbers showing up
multiple
Hi,
Out of pure curiosity, I was wondering, what would
happen if one tries to use a regular 7200RPM (or 10K)
drive as slog or L2ARC (or both)?
I have done both with success.
At one point my backup pool was a collection of USB attached drives (please
keep the laughter down) with
Michael Shadle wrote:
Actually I guess my real question is why iostat hasn't logged any
errors in its counters even though the device has been bad in there
for months?
One of my arrays had a drive in slot 4 fault -- lots of reset something or
other
errors. I cleared the errors and the pool
I've found plenty of documentation on how to create a
ZFS volume, iscsi share it, and then do a fresh
install of Fedora or Windows on the volume.
Really? I have found just the opposite: how to move your functioning
Windows/Linux install to iSCSI.
I am fumbling through this process for
' iostat -Eni ' indeed outputs Device ID on some of
the drives,but I still
can't understand how it helps me to identify model
of specific drive.
Get and install smartmontools. Period. I resisted it for a few weeks but it
has been an amazing tool. It will tell you more than you ever
Cindy wrote:
Mirrored pools are more flexible and generally
provide good performance.
You can easily create a mirrored pool of two disks
and then add two
more disks later. You can also replace each disk with
larger disks
if needed. See the example below.
There is no dispute that multiple
I think the request is to remove vdev's from a pool.
Not currently possible. Is this in the works?
Actually, I think this is two requests, hashed over hundreds of times in this
forum:
1. Remove a vdev from a pool
2. Nondisruptively change vdev geometry
#1 above has a stunningly obvious use
Joachim Worringen wrote:
Greetings,
we are running a few databases of currently 200GB
(growing) in total for data warehousing:
- new data via INSERTs for (up to) millions of rows
per day; sometimes with UPDATEs
- most data in a single table (= 10 to 100s of
millions of rows)
- queries
On Jun 3, 2010 7:35 PM, David Magda wrote:
On Jun 3, 2010, at 13:36, Garrett D'Amore wrote:
Perhaps you have been unlucky. Certainly, there is
a window with N
+1 redundancy where a single failure leaves the
system exposed in
the face of a 2nd fault. This is a statistics
game...
I have a small question about the depth of scrub in a
raidz/2/3 configuration.
I'm quite sure scrub does not check spares or unused
areas of the disks (it
could check if the disks detects any errors there).
But what about the parity?
From some informal performance testing of RAIDZ2/3
David Dyer-Bennet wrote:
My choice of mirrors rather than RAIDZ is based on
the fact that I have
only 8 hot-swap bays (I still think of this as LARGE
for a home server;
the competition, things like the Drobo, tends to have
4 or 5), that I
don't need really large amounts of storage (after my
I have a Sun A5000, 22x 73GB 15K disks in split-bus configuration, two dual 2Gb
HBAs and four fibre cables from server to array, all for just under $200.
The array gives 4Gb of aggregate thoughput in each direction across two 11 disk
buses.
Right now it is the main array, but when we outgrow
Was my raidz2 performance comment above correct?
That the write speed is that of the slowest disk?
That is what I believe I have
read.
You are
sort-of-correct that its the write speed of the
slowest disk.
My experience is not in line with that statement. RAIDZ will write a complete
To fix it, I swapped out the Adaptec controller and
put in LSI Logic
and all the problems went away.
I'm using Sun's built-in LSI controller with (I presume) the original internal
cable shipped by Sun.
Still, no joy for me at U320 speeds. To be precise, when the controller is set
at
Any news regarding this issue? I'm having the same
problems.
Me too. My v40z with U320 drives in the internal bay will lock up partway
through a scrub.
I backed the whole SCSI chain down to U160, but it seems a shame that U320
speeds can't be used.
--
This message posted from
--- On Thu, 1/7/10, Tiernan OToole lsmart...@gmail.com wrote:
Sorry to hijack the thread, but can you
explain your setup? Sounds interesting, but need more
info...
This is just a home setup to amuse me and placate my three boys, each of whom
has several Windows instances running under
Ian wrote:
Why did you set dedup=verify on the USB pool?
Because that is my last-ditch copy of the data and MUST be correct. At the
same time, I want to cram as much data as possible into the pool.
If I ever go to the USB pool, something has already gone horribly wrong and I
am desperate. I
Michael Herf wrote:
I've written about my slow-to-dedupe RAIDZ.
After a week of.waitingI finally bought a
little $100 30G OCZ
Vertex and plugged it in as a cache.
After 2 hours of warmup, my zfs send/receive rate on
the pool is
16MB/sec (reading and writing each at 16MB as
Hi Ross,
What about old good raid10? It's a pretty
reasonable choice for
heavy loaded storages, isn't it?
I remember when I migrated raidz2 to 8xdrives
raid10 the application
administrators were just really happy with the new
access speed. (we
didn't use stripped raidz2
Bob Friesenhahn wrote:
Why are people talking about RAID-5, RAID-6, and
RAID-10 on this
list? This is the zfs-discuss list and zfs does not
do RAID-5,
RAID-6, or RAID-10.
Applying classic RAID terms to zfs is just plain
wrong and misleading
since zfs does not directly implement these
Bob Friesenhahn wrote:
On Tue, 22 Dec 2009, Marty Scholes wrote:
That's not entirely true, is it?
* RAIDZ is RAID5 + checksum + COW
* RAIDZ2 is RAID6 + checksum + COW
* A stack of mirror vdevs is RAID10 + checksum +
COW
These are layman's simplifications that no one here
should
risner wrote:
If I understand correctly, raidz{1} is 1 drive
protection and space is (drives - 1) available.
Raidz2 is 2 drive protection and space is (drives -
2) etc. Same for raidz3 being 3 drive protection.
Yes.
Everything I've seen you should stay around 6-9
drives for raidz, so
Erik Trimble wrote:
As always, the devil is in the details. In this case,
the primary
problem I'm having is maintaining two different block
mapping schemes
(one for the old disk layout, and one for the new
disk layout) and still
being able to interrupt the expansion process. My
primary
Generally speaking, striping mirrors will be faster
than raidz or raidz2,
but it will require a higher number of disks and
therefore higher cost to
The main reason to use
raidz or raidz2 instead
of striping mirrors would be to keep the cost down,
or to get higher usable
space out of a
Lori Alt wrote:
As for being able to read streams of a later format
on an earlier
version of ZFS, I don't think that will ever be
supported. In that
case, we really would have to somehow convert the
format of the objects
stored within the send stream and we have no plans to
implement
This line of reasoning doesn#39;t get you very far.
It is much better to take a look atbr
the mean time to data loss (MTTDL) for the various
configurations. I wrote abr
series of blogs to show how this is done.br
a href=http://blogs.sun.com/relling/tags/mttdl;
Yes. This is a mathematical way of saying
lose any P+1 of N disks.
I am hesitant to beat this dead horse, yet it is a nuance that either I have
completely misunderstood or many people I've met have completely missed.
Whether a stripe of mirrors or mirror of a stripes, any single failure
The zfs send stream is dependent on the version of
the filesystem, so the
only way to create an older stream is to create a
back-versioned
filesystem:
zfs create -o version=N pool/filesystem
You can see what versions your system supports by
using the zfs upgrade
command:
65 matches
Mail list logo