IIRC dump is special.
As for swap... really, you don't want to swap. If you're swapping you
have problems. Any swap space you have is to help you detect those
problems and correct them before apps start getting ENOMEM. There
*are* exceptions to this, such as Varnish. For Varnish and any other
Bloom filters are very small, that's the difference. You might only need a
few bits per block for a Bloom filter. Compare to the size of a DDT entry.
A Bloom filter could be cached entirely in main memory.
___
zfs-discuss mailing list
I've wanted a system where dedup applies only to blocks being written
that have a good chance of being dups of others.
I think one way to do this would be to keep a scalable Bloom filter
(on disk) into which one inserts block hashes.
To decide if a block needs dedup one would first check the
On Mon, Jan 14, 2013 at 1:48 PM, Tomas Forsman st...@acc.umu.se wrote:
https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599
Host oraclecorp.com not found: 3(NXDOMAIN)
Would oracle.internal be a better domain name?
Things like that cannot be changed easily. They (Oracle) are
The copies thing is a really only for laptops, where the likelihood of
redundancy is very low (there are some high-end laptops with multiple
drives, but those are relatively rare) and where this idea is better
than nothing. It's also nice that copies can be set on a per-dataset
manner (whereas
On Wed, Jul 11, 2012 at 9:48 AM, casper@oracle.com wrote:
Huge space, but still finite=85
Dan Brown seems to think so in Digital Fortress but it just means he
has no grasp on big numbers.
I couldn't get past that. I had to put the book down. I'm guessing
it was as awful as it threatened
On Wed, Jul 11, 2012 at 3:45 AM, Sašo Kiselkov skiselkov...@gmail.com wrote:
It's also possible to set dedup=verify with checksum=sha256,
however, that makes little sense (as the chances of getting a random
hash collision are essentially nil).
IMO dedup should always verify.
Nico
--
You can treat whatever hash function as an idealized one, but actual
hash functions aren't. There may well be as-yet-undiscovered input
bit pattern ranges where there's a large density of collisions in some
hash function, and indeed, since our hash functions aren't ideal,
there must be. We just
On Wed, Jul 4, 2012 at 11:14 AM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
On Tue, 3 Jul 2012, James Litchfield wrote:
Agreed - msync/munmap is the only guarantee.
I don't see that the munmap definition assures that anything is written to
disk. The system is free to buffer the data
On Tue, Jul 3, 2012 at 9:48 AM, James Litchfield
jim.litchfi...@oracle.com wrote:
On 07/02/12 15:00, Nico Williams wrote:
You can't count on any writes to mmap(2)ed files hitting disk until
you msync(2) with MS_SYNC. The system should want to wait as long as
possible before committing any
On Mon, Jul 2, 2012 at 3:32 PM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
On Mon, 2 Jul 2012, Iwan Aucamp wrote:
I'm interested in some more detail on how ZFS intent log behaves for
updated done via a memory mapped file - i.e. will the ZIL log updates done
to an mmap'd file or not ?
On Tue, Jun 26, 2012 at 9:44 AM, Alan Coopersmith
alan.coopersm...@oracle.com wrote:
On 06/26/12 05:46 AM, Lionel Cons wrote:
On 25 June 2012 11:33, casper@oracle.com wrote:
To be honest, I think we should also remove this from all other
filesystems and I think ZFS was created this way
On Tue, Jun 26, 2012 at 8:12 AM, Lionel Cons
lionelcons1...@googlemail.com wrote:
On 26 June 2012 14:51, casper@oracle.com wrote:
We've already asked our Netapp representative. She said it's not hard
to add that.
Did NetApp tell you that they'll add support for using the NFSv4 LINK
On Mon, Jun 11, 2012 at 5:05 PM, Tomas Forsman st...@acc.umu.se wrote:
.. or use a mail reader that doesn't suck.
Or the mailman thread view.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
COW goes back at least to the early days of virtual memory and fork().
On fork() the kernel would arrange for writable pages in the parent
process to be made read-only so that writes to them could be caught
and then the page fault handler would copy the page (and restore write
access) so the
On Wed, May 2, 2012 at 7:59 AM, Paul Kraus p...@kraus-haus.org wrote:
On Wed, May 2, 2012 at 7:46 AM, Darren J Moffat darr...@opensolaris.org
wrote:
If Oracle is only willing to share (public) information about the
roadmap for products via official sales channels then there will be
lots of
On Thu, Apr 26, 2012 at 12:10 AM, Richard Elling
richard.ell...@gmail.com wrote:
On Apr 25, 2012, at 8:30 PM, Carson Gaspar wrote:
Reboot requirement is a lame client implementation.
And lame protocol design. You could possibly migrate read-write NFSv3
on the fly by preserving FHs and somehow
On Thu, Apr 26, 2012 at 5:45 PM, Carson Gaspar car...@taltos.org wrote:
On 4/26/12 2:17 PM, J.P. King wrote:
I don't know SnapMirror, so I may be mistaken, but I don't see how you
can have non-synchronous replication which can allow for seamless client
failover (in the general case).
On Thu, Apr 26, 2012 at 12:37 PM, Richard Elling
richard.ell...@gmail.com wrote:
[...]
NFSv4 had migration in the protocol (excluding protocols between
servers) from the get-go, but it was missing a lot (FedFS) and was not
implemented until recently. I've no idea what clients and servers
As I understand it LLNL has very large datasets on ZFS on Linux. You
could inquire with them, as well as
http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/topics?pli=1
. My guess is that it's quite stable for at least some use cases
(most likely: LLNL's!), but that may not be yours.
I agree, you need something like AFS, Lustre, or pNFS. And/or an NFS
proxy to those.
Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Wed, Apr 25, 2012 at 4:26 PM, Paul Archer p...@paularcher.org wrote:
2:20pm, Richard Elling wrote:
Ignoring lame NFS clients, how is that architecture different than what
you would have
with any other distributed file system? If all nodes share data to all
other nodes, then...?
Simple.
On Wed, Apr 25, 2012 at 5:22 PM, Richard Elling
richard.ell...@gmail.com wrote:
Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents).
FWIW,
automounters were invented 20+ years ago to handle this in a nearly seamless
manner.
Today, we have DFS from Microsoft and NFS
On Wed, Apr 25, 2012 at 5:42 PM, Ian Collins i...@ianshome.com wrote:
Aren't those general considerations when specifying a file server?
There are Lustre clusters with thousands of nodes, hundreds of them
being servers, and high utilization rates. Whatever specs you might
have for one server
On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling
richard.ell...@gmail.com wrote:
On Apr 25, 2012, at 3:36 PM, Nico Williams wrote:
I disagree vehemently. automount is a disaster because you need to
synchronize changes with all those clients. That's not realistic.
Really? I did it with NIS
On Wed, Apr 25, 2012 at 8:57 PM, Paul Kraus pk1...@gmail.com wrote:
On Wed, Apr 25, 2012 at 9:07 PM, Nico Williams n...@cryptonector.com wrote:
Nothing's changed. Automounter + data migration - rebooting clients
(or close enough to rebooting). I.e., outage.
Uhhh, not if you design your
On Wed, Jan 18, 2012 at 4:53 AM, Jim Klimov jimkli...@cos.ru wrote:
2012-01-18 1:20, Stefan Ring wrote:
I don’t care too much if a single document gets corrupted – there’ll
always be a good copy in a snapshot. I do care however if a whole
directory branch or old snapshots were to disappear.
On Wed, Jan 11, 2012 at 9:16 AM, Jim Klimov jimkli...@cos.ru wrote:
I've recently had a sort of an opposite thought: yes,
ZFS redundancy is good - but also expensive in terms
of raw disk space. This is especially bad for hardware
space-constrained systems like laptops and home-NASes,
where
On Thu, Jan 5, 2012 at 8:53 AM, sol a...@yahoo.com wrote:
if a bug fixed in Illumos is never reported to Oracle by a customer,
it would likely never get fixed in Solaris either
:-(
I would have liked to think that there was some good-will between the ex- and
current-members of the zfs
On Thu, Dec 29, 2011 at 9:53 AM, Brad Diggs brad.di...@oracle.com wrote:
Jim,
You are spot on. I was hoping that the writes would be close enough to
identical that
there would be a high ratio of duplicate data since I use the same record
size, page size,
compression algorithm, … etc.
On Thu, Dec 29, 2011 at 2:06 PM, sol a...@yahoo.com wrote:
Richard Elling wrote:
many of the former Sun ZFS team
regularly contribute to ZFS through the illumos developer community.
Does this mean that if they provide a bug fix via illumos then the fix won't
make it into the Oracle code?
If
On Thu, Dec 29, 2011 at 6:44 PM, Matthew Ahrens mahr...@delphix.com wrote:
On Mon, Dec 12, 2011 at 11:04 PM, Erik Trimble tr...@netdemons.com wrote:
(1) when constructing the stream, every time a block is read from a fileset
(or volume), its checksum is sent to the receiving machine. The
On Tue, Dec 27, 2011 at 2:20 PM, Frank Cusack fr...@linetwo.net wrote:
http://sparcv9.blogspot.com/2011/12/solaris-11-illumos-and-source.html
If I upgrade ZFS to use the new features in Solaris 11 I will be unable
to import my pool using the free ZFS implementation that is available in
On Tue, Dec 27, 2011 at 8:44 PM, Frank Cusack fr...@linetwo.net wrote:
So with a de facto fork (illumos) now in place, is it possible that two
zpools will report the same version yet be incompatible across
implementations?
Not likely: the Illumos community has developed a method for managing
On Dec 11, 2011 5:12 AM, Nathan Kroenert nat...@tuneunix.com wrote:
On 12/11/11 01:05 AM, Pawel Jakub Dawidek wrote:
On Wed, Dec 07, 2011 at 10:48:43PM +0200, Mertol Ozyoney wrote:
Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.
The only vendor i know that can do
On Tue, Nov 29, 2011 at 12:17 PM, Cindy Swearingen
cindy.swearin...@oracle.com wrote:
I think the too many open files is a generic error message about running
out of file descriptors. You should check your shell ulimit
information.
Also, see how many open files you have: echo /proc/self/fd/*
On Mon, Nov 28, 2011 at 11:28 AM, Smith, David W. smith...@llnl.gov wrote:
You could list by inode, then use find with rm.
# ls -i
7223 -O
# find . -inum 7223 -exec rm {} \;
This is the one solution I'd recommend against, since it would remove
hardlinks that you might care about.
Also,
Moving boot disks from one machine to another used to work as long as
the machines were of the same architecture. I don't recall if it was
*supported* (and wouldn't want to pretend to speak for Oracle now),
but it was meant to work (unless you minimized the install and removed
drivers not needed
On Mon, Nov 14, 2011 at 8:33 AM, Edward Ned Harvey
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Paul Kraus
Is it really B-Tree based? Apple's HFS+ is B-Tree based and falls
apart
I see, with great pleasure, that ZFS in Solaris 11 has a new
aclmode=mask property.
http://download.oracle.com/docs/cd/E23824_01/html/821-1448/gbscy.html#gkkkp
http://download.oracle.com/docs/cd/E23824_01/html/821-1448/gbchf.html#gljyz
On Mon, Nov 14, 2011 at 6:20 PM, Nico Williams n...@cryptonector.com wrote:
I see, with great pleasure, that ZFS in Solaris 11 has a new
aclmode=mask property.
Also, congratulations on shipping. And thank you for implementing aclmode=mask.
Nico
On Fri, Nov 11, 2011 at 4:27 PM, Paul Kraus p...@kraus-haus.org wrote:
The command syntax paradigm of zfs (command sub-command object
parameters) is not unique to zfs, but seems to have been the way of
doing things in Solaris 10. The _new_ functions of Solaris 10 were
all this way (to the best
To some people active-active means all cluster members serve the
same filesystems.
To others active-active means all cluster members serve some
filesystems and can serve all filesystems ultimately by taking over
failed cluster members.
Nico
--
___
On Wed, Oct 19, 2011 at 7:24 AM, Garrett D'Amore
garrett.dam...@nexenta.com wrote:
I'd argue that from a *developer* point of view, an fsck tool for ZFS might
well be useful. Isn't that what zdb is for? :-)
But ordinary administrative users should never need something like this,
unless
On Tue, Oct 18, 2011 at 9:35 AM, Brian Wilson bfwil...@doit.wisc.edu wrote:
I just wanted to add something on fsck on ZFS - because for me that used to
make ZFS 'not ready for prime-time' in 24x7 5+ 9s uptime environments.
Where ZFS doesn't have an fsck command - and that really used to bug me
On Thu, Oct 13, 2011 at 9:13 PM, Jim Klimov jimkli...@cos.ru wrote:
Thanks to Nico for concerns about POSIX locking. However,
hopefully, in the usecase I described - serving images of
VMs in a manner where storage, access and migration are
efficient - whole datasets (be it volumes or FS
Also, it's not worth doing a clustered ZFS thing that is too
application-specific. You really want to nail down your choices of
semantics, explore what design options those yield (or approach from
the other direction, or both), and so on.
Nico
--
___
On Tue, Oct 11, 2011 at 11:15 PM, Richard Elling
richard.ell...@gmail.com wrote:
On Oct 9, 2011, at 10:28 AM, Jim Klimov wrote:
ZFS developers have for a long time stated that ZFS is not intended,
at least not in near term, for clustered environments (that is, having
a pool safely imported by
On Sun, Oct 9, 2011 at 12:28 PM, Jim Klimov jimkli...@cos.ru wrote:
So, one version of the solution would be to have a single host
which imports the pool in read-write mode (i.e. the first one
which boots), and other hosts would write thru it (like iSCSI
or whatever; maybe using SAS or FC to
On Mon, Sep 26, 2011 at 1:55 PM, Jesus Cea j...@jcea.es wrote:
I just upgraded to Solaris 10 Update 10, and one of the improvements
is zfs diff.
Using the birthtime of the sectors, I would expect very high
performance. The actual performance doesn't seems better that an
standard rdiff,
Ah yes, of course. I'd misread your original post. Yes, disabling
atime updates will reduce the number of superfluous transactions.
It's *all* transactions that count, not just the ones the app
explicitly caused, and atime implies lots of transactions.
Nico
--
On Fri, Sep 9, 2011 at 5:33 AM, Sriram Narayanan sri...@belenix.org wrote:
Plus, you'll need an character at the end of each command.
And a wait command, if you want the script to wait for the sends to
finish (which you should).
Nico
--
___
On Wed, Jul 27, 2011 at 9:22 PM, Daniel Carosone d...@geek.com.au wrote:
Absent TRIM support, there's another way to do this, too. It's pretty
easy to dd /dev/zero to a file now and then. Just make sure zfs
doesn't prevent these being written to the SSD (compress and dedup are
off). I have
On Jul 9, 2011 1:56 PM, Edward Ned Harvey
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
Given the abysmal performance, I have to assume there is a significant
number of overhead reads or writes in order to maintain the DDT for each
actual block write operation. Something I didn't
IMO a faster processor with built-in AES and other crypto support is
most likely to give you the most bang for your buck, particularly if
you're using closed Solaris 11, as Solaris engineering is likely to
add support for new crypto instructions faster than Illumos (but I
don't really know enough
On Jun 27, 2011 9:24 PM, David Magda dma...@ee.ryerson.ca wrote:
AESNI is certain better than nothing, but RSA, SHA, and the RNG would be
nice as well. It'd also be handy for ZFS crypto in addition to all the
network IO stuff.
The most important reason for AES-NI might be not performance but
On Jun 27, 2011 4:15 PM, David Magda dma...@ee.ryerson.ca wrote:
The (Ultra)SPARC T-series processors do, but to a certain extent it goes
against a CPU manufacturers best (financial) interest to provide this:
crypto is very CPU intensive using 'regular' instructions, so if you need
to do a lot
As Casper pointed out, the right thing to do is to build applications
such that they can detect mid-transaction state and roll it back (or
forward, if there's enough data). Then mid-transaction snapshots are
fine, and the lack of APIs by which to inform the filesystem of
application transaction
On Thu, Jun 16, 2011 at 8:51 AM, casper@oracle.com wrote:
If a database engine or another application keeps both the data and the
log in the same filesystem, a snapshot wouldn't create inconsistent data
(I think this would be true with vim and a large number of database
engines; vim will
That said, losing committed transactions when you needed and thought
you had ACID semantics... is bad. But that's implied in any
restore-from-backups situation. So you replicate/distribute
transactions so that restore from backups (or snapshots) is an
absolutely last resort matter, and if you
On Mon, Jun 13, 2011 at 5:50 AM, Roy Sigurd Karlsbakk r...@karlsbakk.net
wrote:
If anyone has any ideas be it ZFS based or any useful scripts that
could help here, I am all ears.
Something like this one-liner will show what would be allocated by everything
if hardlinks weren't used:
#
On Mon, Jun 13, 2011 at 12:59 PM, Nico Williams n...@cryptonector.com wrote:
Try this instead:
(echo 0; find . -type f \! -links 1 | xargs stat -c %b %B *+ $p; echo p) |
dc
s/\$p//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http
And, without a sub-shell:
find . -type f \! -links 1 | xargs stat -c %b %B *+p /dev/null | dc
2/dev/null | tail -1
(The stderr redirection is because otherwise dc whines once that the
stack is empty, and the tail is because we print interim totals as we
go.)
Also, this doesn't quit work, since
On Sun, Jun 12, 2011 at 4:14 PM, Scott Lawson
scott.law...@manukau.ac.nz wrote:
I have an interesting question that may or may not be answerable from some
internal
ZFS semantics.
This is really standard Unix filesystem semantics.
[...]
So total storage used is around ~7.5MB due to the hard
On May 25, 2011 7:15 AM, Garrett Dapos;Amore garr...@nexenta.com wrote:
You are welcome to your beliefs. There are many groups that do standards
that do not meet in public. [...]
True.
[...] In fact, I can't think of any standards bodies that *do* hold open
meetings.
I can: the IETF, for
On Sun, May 22, 2011 at 10:20 AM, Richard Elling
richard.ell...@gmail.com wrote:
ZFS already tracks the blocks that have been written, and the time that
they were written. So we already know when something was writtem, though
that does not answer the question of whether the data was changed. I
On Sun, May 22, 2011 at 1:52 PM, Nico Williams n...@cryptonector.com wrote:
[...] Or perhaps you'll argue that no one should ever need bi-di
replication, that if one finds oneself wanting that then one has taken
a wrong turn somewhere.
You could also grant the premise and argue instead
Also, sparseness need not be apparent to applications. Until recent
improvements to lseek(2) to expose hole/non-hole offsets, the only way
to know about sparseness was to notice that a file's reported size is
more than the file's reported filesystem blocks times the block size.
Sparse files in
On Mon, May 2, 2011 at 3:56 PM, Eric D. Mudama
edmud...@bounceswoosh.org wrote:
Yea, kept googling and it makes sense. I guess I am simply surprised
that the application would have done the seek+write combination, since
on NTFS (which doesn't support sparse) these would have been real
1.5GB
Then again, Windows apps may be doing seek+write to pre-allocate storage.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Thu, Feb 17, 2011 at 3:07 PM, Richard Elling
richard.ell...@gmail.com wrote:
On Feb 17, 2011, at 12:44 PM, Stefan Dormayer wrote:
Hi all,
is there a way to disable the subcommand destroy of zpool/zfs for the root
user?
Which OS?
Heheh. Great answer. The real answer depends also on
On Feb 14, 2011 6:56 AM, Paul Kraus p...@kraus-haus.org wrote:
P.S. I am measuring number of objects via `zdb -d` as that is faster
than trying to count files and directories and I expect is a much
better measure of what the underlying zfs code is dealing with (a
particular dataset may have
On Mon, Feb 7, 2011 at 1:17 PM, Yi Zhang yizhan...@gmail.com wrote:
On Mon, Feb 7, 2011 at 1:51 PM, Brandon High bh...@freaks.com wrote:
Maybe I didn't make my intention clear. UFS with directio is
reasonably close to a raw disk from my application's perspective: when
the app writes to a file
73 matches
Mail list logo