Re: [zfs-discuss] Direct I/O ability with zfs?
Restarting this thread... I've just finished reading the article, A look at MySQL on ZFS: http://dev.mysql.com/tech-resources/articles/mysql-zfs.html The section MySQL Performance Comparison: ZFS vs. UFS on Open Solaris looks interesting... Rayson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Hi Tony, John posted the URL to his article to the databases-discuss this morning, and I only took a very quick look. May be you can join that list and discuss further regarding the configurations? http://mail.opensolaris.org/mailman/listinfo/databases-discuss Rayson On 10/29/07, Tony Leone [EMAIL PROTECTED] wrote: This is very interesting because it directly contradicts the results the ZFS developers are posting on the OpenSolaris mailing list. I just scanned the article, does he give his ZFS settings and is he separate ZIL devices? Tony Leone Rayson Ho [EMAIL PROTECTED] 10/29/2007 11:39 AM Restarting this thread... I've just finished reading the article, A look at MySQL on ZFS: http://dev.mysql.com/tech-resources/articles/mysql-zfs.html The section MySQL Performance Comparison: ZFS vs. UFS on Open Solaris looks interesting... Rayson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Hi All, Any update on this ? -Masthan D dudekula mastan [EMAIL PROTECTED] wrote: Hi Everybody, From the last one week so many mails are exchanged on this topic. I have also one similar issue like this. I will appreciate if any one helps me on this. I have an IO test tool, which writes the data and reads the data and then compare the read data with write data. If read data and write data are same then there is no CORRUIPTION else there is a CORRUPTION. File data may corrupt because of any reasons and one possible reason is file system cache. If file system cache have issues, it will give wrong data (wrong data means the actual data on the disk and the data that read call return to the application are not match) to user applications. When there is a CORRUPTION, to check file system cache issues, my application bypass the file system cache and then reads (Re-read) the data from the same file and then compare the re-read data with write data. Tell me, is there a way to skip ZFS file system cache or tell me is there a way to do direct IO on ZFS file system? Regards Masthan D - Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss - Pinpoint customers who are looking for what you sell. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Tell me, is there a way to skip ZFS file system cache or tell me is there a way to do direct IO on ZFS file system? No, currently there is no way to disable file system cache aka ARC in ZFS. There is a pending RFE though, 6429855 Need way to tell ZFS that caching is a lost cause Cheers, Vidya Sakar dudekula mastan wrote: Hi All, Any update on this ? -Masthan D */dudekula mastan [EMAIL PROTECTED]/* wrote: Hi Everybody, From the last one week so many mails are exchanged on this topic. I have also one similar issue like this. I will appreciate if any one helps me on this. I have an IO test tool, which writes the data and reads the data and then compare the read data with write data. If read data and write data are same then there is no CORRUIPTION else there is a CORRUPTION. File data may corrupt because of any reasons and one possible reason is file system cache. If file system cache have issues, it will give wrong data (wrong data means the actual data on the disk and the data that read call return to the application are not match) to user applications. When there is a CORRUPTION, to check file system cache issues, my application bypass the file system cache and then reads (Re-read) the data from the same file and then compare the re-read data with write data. Tell me, is there a way to skip ZFS file system cache or tell me is there a way to do direct IO on ZFS file system? Regards Masthan D Don't let your dream ride pass you by. Make it a reality http://us.rd.yahoo.com/evt=51200/*http://autos.yahoo.com/index.html;_ylc=X3oDMTFibjNlcHF0BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDYXV0b3MtZHJlYW1jYXI- with Yahoo! Autos. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Pinpoint customers http://us.rd.yahoo.com/evt=48250/*http://searchmarketing.yahoo.com/arp/sponsoredsearch_v9.php?o=US2226cmp=Yahooctv=AprNIs=Ys2=EMb=50who are looking for what you sell. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Hi Everybody, From the last one week so many mails are exchanged on this topic. I have also one similar issue like this. I will appreciate if any one helps me on this. I have an IO test tool, which writes the data and reads the data and then compare the read data with write data. If read data and write data are same then there is no CORRUIPTION else there is a CORRUPTION. File data may corrupt because of any reasons and one possible reason is file system cache. If file system cache have issues, it will give wrong data (wrong data means the actual data on the disk and the data that read call return to the application are not match) to user applications. When there is a CORRUPTION, to check file system cache issues, my application bypass the file system cache and then reads (Re-read) the data from the same file and then compare the re-read data with write data. Tell me, is there a way to skip ZFS file system cache or tell me is there a way to do direct IO on ZFS file system? Regards Masthan D - Don't let your dream ride pass you by.Make it a reality with Yahoo! Autos. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Peter Schuller wrote: Is there a specific reason why you need to do the caching at the DB level instead of the file system? I'm really curious as i've got conflicting data on why people do this. If i get more data on real reasons on why we shouldn't cache at the file system, then this could get bumped up in my priority queue. FWIW a MySQL database was recently moved to a FreeBSD system with ZFS. Performance ended up sucking because for some reason data did not make it into the cache in a predictable fashion (simple case of repeated queries were not cached; so for example a very common query, even when executed repeatedly on an idle system, would take more than 1 minute instead of 0.10 seconds or so when cached). Ended up convincing the person running the DB to switch from MyISAM (which does not seem to support DB level caching, other than of indexes) to InnoDB, thus allowing use of the InnoDB buffer cache. I don't know why it wasn't cached by ZFS/ARC to begin with (the size of the ARC cache was definitely large enough - ~ 800 MB, and I know the working set for this query was below 300 MB). Perhaps it has to do with ARC trying to be smart and avoiding flushing the cache with useless data? I am not read up on the details of the ARC. But in this particular case it was clear that a simple LRU had been much more useful - unless there was some other problem related to my setup or FreeBSD integration that somehow broke proper caching. Neel's arcstat might help shed light on such behaviour. http://blogs.sun.com/realneel/entry/zfs_arc_statistics -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
From: Anton B. Rang [EMAIL PROTECTED] For many databases, most of the I/O is writes (reads wind up cached in memory). 2 words: table scan -=dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On Fri, Oct 05, 2007 at 08:56:26AM -0700, Tim Spriggs wrote: Time for on board FPGAs! Heh! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Nicolas Williams wrote: On Thu, Oct 04, 2007 at 10:26:24PM -0700, Jonathan Loran wrote: I can envision a highly optimized, pipelined system, where writes and reads pass through checksum, compression, encryption ASICs, that also locate data properly on disk. ... I've argued before that RAID-Z could be implemented in hardware. But I think that it's all about economics. Software is easier to develop and patch than hardware, so if we can put together systems with enough memory, general purpose CPU horsepower, and memory and I/O bandwidth, all cheaply enough, then that will be better than developing special purpose hardware for ZFS. Thumper is an example of such a system. Eventually we may find trends in system design once again favoring pushing special tasks to the edge. When that happens I'm sure we'll go there. But right now the trend is to put crypto co-processors and NICs on the same die as the CPU. Nico 1) We can put it on the same die also, or at least as a chip set on the MoBo. 2) Offload engines do have software, stored in firmware. Or maybe such an offload processor could run software out of a driver, could be booted in dynamically? 3) You all are aware of how many micro processors are involved in a normal file server right? There's one at almost every interface, disk to controller, controller to PCI bridge, PCI bridge to Hyperbus, etc. Imagine the burden if you did all that in the CPU only? I sometimes find it amazing computers are as stable as they are, but it's all in the maturity of the code running in every step of the way, and of course, good firmware coding practices. Your vanilla SCSI controllers and disk drives do a lot of very complex but useful processing. We trust these guys 100%, because the interface is stable, and the code and processors are mature, well used. I do agree, pushing ZFS to the edge will come down the road, when it becomes less dynamic (how boring) and we know more about the bottlenecks. Jon -- - _/ _/ / - Jonathan Loran - - -/ / /IT Manager - - _ / _ / / Space Sciences Laboratory, UC Berkeley -/ / / (510) 643-5146 [EMAIL PROTECTED] - __/__/__/ AST:7731^29u18e3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Is there a specific reason why you need to do the caching at the DB level instead of the file system? I'm really curious as i've got conflicting data on why people do this. If i get more data on real reasons on why we shouldn't cache at the file system, then this could get bumped up in my priority queue. FWIW a MySQL database was recently moved to a FreeBSD system with ZFS. Performance ended up sucking because for some reason data did not make it into the cache in a predictable fashion (simple case of repeated queries were not cached; so for example a very common query, even when executed repeatedly on an idle system, would take more than 1 minute instead of 0.10 seconds or so when cached). Ended up convincing the person running the DB to switch from MyISAM (which does not seem to support DB level caching, other than of indexes) to InnoDB, thus allowing use of the InnoDB buffer cache. I don't know why it wasn't cached by ZFS/ARC to begin with (the size of the ARC cache was definitely large enough - ~ 800 MB, and I know the working set for this query was below 300 MB). Perhaps it has to do with ARC trying to be smart and avoiding flushing the cache with useless data? I am not read up on the details of the ARC. But in this particular case it was clear that a simple LRU had been much more useful - unless there was some other problem related to my setup or FreeBSD integration that somehow broke proper caching. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org pgpmGHbRPWivC.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
But note that, for ZFS, the win with direct I/O will be somewhat less. That's because you still need to read the page to compute its checksum. So for direct I/O with ZFS (with checksums enabled), the cost is W:LPS, R:2*LPS. Is saving one page of writes enough to make a difference? Possibly not. It's more complicated than that. The kernel would be verifying checksums on buffers in a user's address space. For this to work, we have to map these buffers into the kernel and simultaneously arrange for these pages to be protected from other threads in the user's address space. We discussed some of the VM gymnastics required to properly implement this back in January: http://mail.opensolaris.org/pipermail/zfs-discuss/2007-January/thread.html#36890 -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Would it be easier to ... 1) Change ZFS code to enable a sort of directIO emulation and then run various tests... or 2) Use Sun's performance team, which have all the experience in the world when it comes to performing benchmarks on Solaris and Oracle .. + a Dtrace master to drill down and see what the difference is between UFS and UFS/DIO... and where the real win lies. On 10/4/07, eric kustarz [EMAIL PROTECTED] wrote: On Oct 3, 2007, at 3:44 PM, Dale Ghent wrote: On Oct 3, 2007, at 5:21 PM, Richard Elling wrote: Slightly off-topic, in looking at some field data this morning (looking for something completely unrelated) I notice that the use of directio on UFS is declining over time. I'm not sure what that means... hopefully not more performance escalations... Sounds like someone from ZFS team needs to get with someone from Oracle/MySQL/Postgres and get the skinny on how the IO rubber-road boundary should look, because it doesn't sound like there's a definitive or at least a sure answer here. I've done that already (Oracle, Postgres, JavaDB, etc.). Because the holy grail of directI/O is an overloaded term, we don't really know where the win within directI/O lies. In any event, it seems the only way to get a definitive answer here is to prototype a no caching property... eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Where does the win come from with directI/O? Is it 1), 2), or some combination? If its a combination, what's the percentage of each towards the win? That will vary based on workload (I know, you already knew that ... :^). Decomposing the performance win between what is gained as a result of single writer lock breakup and no caching is something we can only guess at, because, at least for UFS, you can't do just one - it's all or nothing. We need to tease 1) and 2) apart to have a full understanding. We can't. We can only guess (for UFS). My opinion - it's a must-have for ZFS if we're going to get serious attention in the database space. I'll bet dollars-to-donuts that, over the next several years, we'll burn many tens-of-millions of dollars on customer support escalations that come down to memory utilization issues and contention between database specific buffering and the ARC. This is entirely my opinion (not that of Sun), and I've been wrong before. Thanks, /jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Jim Mauro writes: Where does the win come from with directI/O? Is it 1), 2), or some combination? If its a combination, what's the percentage of each towards the win? That will vary based on workload (I know, you already knew that ... :^). Decomposing the performance win between what is gained as a result of single writer lock breakup and no caching is something we can only guess at, because, at least for UFS, you can't do just one - it's all or nothing. We need to tease 1) and 2) apart to have a full understanding. We can't. We can only guess (for UFS). My opinion - it's a must-have for ZFS if we're going to get serious attention in the database space. I'll bet dollars-to-donuts that, over the next several years, we'll burn many tens-of-millions of dollars on customer support escalations that come down to memory utilization issues and contention between database specific buffering and the ARC. This is entirely my opinion (not that of Sun), ...memory utilisation... OK so we should implement the 'lost cause' rfe. In all cases, ZFS must not steal pages from other memory consumers : 6488341 ZFS should avoiding growing the ARC into trouble So the DB memory pages should not be _contented_ for. -r and I've been wrong before. Thanks, /jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
eric kustarz writes: Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm surprised that this is being met with skepticism considering that Oracle highly recommends direct IO be used, and, IIRC, Oracle performance was the main motivation to adding DIO to UFS back in Solaris 2.6. This isn't a problem with ZFS or any specific fs per se, it's the buffer caching they all employ. So I'm a big fan of seeing 6429855 come to fruition. The point is that directI/O typically means two things: 1) concurrent I/O 2) no caching at the file system In my blog I also mention : 3) no readahead (but can be viewed as an implicit consequence of 2) And someone chimed in with 4) ability to do I/O at the sector granularity. I also think that for many 2) is too weak form of what they expect : 5) DMA straight from user buffer to disk avoiding a copy. So 1) concurrent I/O we have in ZFS. 2) No Caching. we could do by taking a directio hint and evict arc buffer immediately after copyout to user space for reads, and after txg completion for writes. 3) No prefetching. we have 2 level of prefetching. The low level was fixed recently. Should not cause problem to DB loads. The high level still needs fixing on it's own. Then we should take the same hint as 2) to disable it altogether. In the mean time we can tune our way into this mode. 4) Sector sized I/O Is really foreign to ZFS design. 5) Zero Copy more CPU efficientcy. I think is where the debate is. My line has been that 5) won't help latency much and latency is where I think the game is currently played. Now the disconnect might be because people might feel that the game is not latency but CPU efficientcy : how many CPU cycles to I burn to do get data from disk to user buffer. This is a valid point. Configurations can with very large number of disks end up saturated by the filesystem CPU utilisation. So I still think that the major area for ZFS perf gains are on the latency front : block allocation (now much improved with the Separate intent log), I/O scheduling, and other fixes to the threading ARC behavior. But at some point we can turn our microscope onthe CPU efficientcy of the implementation. The copy will certainly be a big chunk of the CPU cost per I/O but I would still like to gather that data. Also consider, 50 disks at 200 IOPS of 8K is 80 MB/sec. That means maybe 1/10th of a single CPU to be saved by avoiding just the copy. Probably not what people have in mind. How many CPU's do you have when attaching 1000 drives to a host running a 100TB database ? That many drivers will barely occupy 2 cores running the copies. People want performance and efficientcy. Directio is just an overloaded name that delivered those gains to other filesystems. Right now, what I think is worth gathering is cycles spent in ZFS per reads writes in a large DB environment where DB holds 90% of memory. For comparison with another FS, we should disable checksum, file prefetching, vdev prefetching, cap the ARC, atime off, 8K recordsize. A breakdown and comparison of the CPU cost per layer will be quite interesting and points to what needs work. Another interesting thing for me would be : what is your budget ? how many cycles per DB reads and writes are you willing to spend and how did you come to that number But, as Eric says, let's develop 2 and I'll try in parallel to figure out the per layer breakdown cost. -r Most file systems (ufs, vxfs, etc.) don't do 1) or 2) without turning on directI/O. ZFS *does* 1. It doesn't do 2 (currently). That is what we're trying to discuss here. Where does the win come from with directI/O? Is it 1), 2), or some combination? If its a combination, what's the percentage of each towards the win? We need to tease 1) and 2) apart to have a full understanding. I'm not against adding 2) to ZFS but want more information. I suppose i'll just prototype it and find out for myself. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On Wed, Oct 03, 2007 at 04:31:01PM +0200, Roch - PAE wrote: It does, which leads to the core problem. Why do we have to store the exact same data twice in memory (i.e., once in the ARC, and once in the shared memory segment that Oracle uses)? We do not retain 2 copies of the same data. If the DB cache is made large enough to consume most of memory, the ZFS copy will quickly be evicted to stage other I/Os on their way to the DB cache. What problem does that pose ? Other things deserving of staying in the cache get pushed out by things that don't deserve being in the cache. Thus systemic memory pressure (e.g., more on-demand paging of text). Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote: ...memory utilisation... OK so we should implement the 'lost cause' rfe. In all cases, ZFS must not steal pages from other memory consumers : 6488341 ZFS should avoiding growing the ARC into trouble So the DB memory pages should not be _contented_ for. What if your executable text, and pretty much everything lives on ZFS? You don't want to content for the memory caching those things either. It's not just the DB's memory you don't want to contend for. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Nicolas Williams writes: On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote: ...memory utilisation... OK so we should implement the 'lost cause' rfe. In all cases, ZFS must not steal pages from other memory consumers : 6488341 ZFS should avoiding growing the ARC into trouble So the DB memory pages should not be _contented_ for. What if your executable text, and pretty much everything lives on ZFS? You don't want to content for the memory caching those things either. It's not just the DB's memory you don't want to contend for. On the read side, We're talking here about 1000 disks each running35 concurrent I/Os of 8K, so a footprint of 250MB, to stage a ton of work. On the write side we do have to play with the transaction group so that will be 5-10 seconds worth of synchronous write activity. But how much memory does a 1000-disks server got ? -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On Thu, Oct 04, 2007 at 06:59:56PM +0200, Roch - PAE wrote: Nicolas Williams writes: On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote: So the DB memory pages should not be _contented_ for. What if your executable text, and pretty much everything lives on ZFS? You don't want to content for the memory caching those things either. It's not just the DB's memory you don't want to contend for. On the read side, We're talking here about 1000 disks each running35 concurrent I/Os of 8K, so a footprint of 250MB, to stage a ton of work. I'm not sure what you mean, but extra copies and memory just to stage the I/Os is not the same as the systemic memory pressure issue. Now, I'm _speculating_ as to what the real problem is, but it seems very likely that putting things in the cache that needn't be there would push out things that should be there, and since restoring those things to the cache later would cost I/Os... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
I'd like to second a couple of comments made recently: * If they don't regularly do so, I too encourage the ZFS, Solaris performance, and Sun Oracle support teams to sit down and talk about the utility of Direct I/O for databases. * I too suspect that absent Direct I/O (or some ringing endorsement from Oracle about how ZFS doesn't need Direct I/O), there will be a drain of customer escalations regarding the lack-- plus FUD and other sales inhibitors. While I realize that Sun has not published a TPC-C result since 2001 and offers a different value proposition to customers, performance does matter and for some cases Direct I/O can contribute to that. Historically, every TPC-C database benchmark run can be converted from being I/O bound to being CPU bound by adding enough disk spindles and enough main memory. In that context, saving the CPU cycles (and cache misses) from a copy are important. Another historical trend was that for performance, portability across different operating systems, and perhaps just because they could, databases tended to use as few OS capabilities as possible and to do their own resource management. So for instance databases were often benchmarked using raw devices. Customers on the other hand preferred the manageability of filesystems and tended to deploy there. In that context, Direct I/O is an attempt to get the best of both worlds. Finally, besides UFS Direct I/O on Solaris, other filesystems including VxFS also have various forms of Direct I/O-- either separate APIs or mount options for that bypass the cache on large writes, etc. Understanding those benefits, both real and advertised, helps understand the opportunities and shortfalls for ZFS. It may be that this is not the most important thing for ZFS performance or capability right now-- measurement in targeted configurations and workloads is the only way to tell-- but I'd be highly surprised if there isn't something (bypass cache on really large writes?) that can't be learned from experiences with Direct I/O. Eric (Hamilton) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Nicolas Williams writes: On Wed, Oct 03, 2007 at 04:31:01PM +0200, Roch - PAE wrote: It does, which leads to the core problem. Why do we have to store the exact same data twice in memory (i.e., once in the ARC, and once in the shared memory segment that Oracle uses)? We do not retain 2 copies of the same data. If the DB cache is made large enough to consume most of memory, the ZFS copy will quickly be evicted to stage other I/Os on their way to the DB cache. What problem does that pose ? Other things deserving of staying in the cache get pushed out by things that don't deserve being in the cache. Thus systemic memory pressure (e.g., more on-demand paging of text). Nico -- I agree. That's why I submitted both of these. 6429855 Need way to tell ZFS that caching is a lost cause 6488341 ZFS should avoiding growing the ARC into trouble -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
5) DMA straight from user buffer to disk avoiding a copy. This is what the direct in direct i/o has historically meant. :-) line has been that 5) won't help latency much and latency is here I think the game is currently played. Now the disconnect might be because people might feel that the game is not latency but CPU efficiency : how many CPU cycles do I burn to do get data from disk to user buffer. Actually, it's less CPU cycles in many cases than memory cycles. For many databases, most of the I/O is writes (reads wind up cached in memory). What's the cost of a write? With direct I/O: CPU writes to memory (spread out over many transactions), disk DMAs from memory. We write LPS (log page size) bytes of data from CPU to memory, we read LPS bytes from memory. On processors without a cache line zero, we probably read the LPS data from memory as part of the write. Total cost = W:LPS, R:2*LPS. Without direct I/O: The cost of getting the data into the user buffer remains the same (W:LPS, R:LPS). We copy the data from user buffer to system buffer (W:LPS, R:LPS). Then we push it out to disk. Total cost = W:2*LPS, R:3*LPS. We've nearly doubled the cost, not including any TLB effects. On a memory-bandwidth-starved system (which should be nearly all modern designs, especially with multi-threaded chips like Niagara), replacing buffered I/O with direct I/O should give you nearly a 2x improvement in log write bandwidth. That's without considering cache effects (which shouldn't be too significant, really, since LPS should be the size of L2). How significant is this? We'd have to measure; and it will likely vary quite a lot depending on which database is used for testing. But note that, for ZFS, the win with direct I/O will be somewhat less. That's because you still need to read the page to compute its checksum. So for direct I/O with ZFS (with checksums enabled), the cost is W:LPS, R:2*LPS. Is saving one page of writes enough to make a difference? Possibly not. Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
I've been thinking about this for awhile, but Anton's analysis makes me think about it even more: We all love ZFS, right. It's futuristic in a bold new way, which many virtues, I won't preach tot he choir. But to make it all glue together has some necessary CPU/Memory intensive operations around checksum generation/validation, compression, encryption, data placement/component load balancing, etc. Processors have gotten really powerful, much more so than the relative disk I/O gains, which in all honesty make ZFS possible. My question: Is anyone working on an offload engine for ZFS? I can envision a highly optimized, pipelined system, where writes and reads pass through checksum, compression, encryption ASICs, that also locate data properly on disk. This could even be in the form of a PCIe SATA/SAS card with many ports, or different options. This would make direct IO, or DMA IO possible again. The file system abstraction with ZFS is really too much and too important to ignore, and too hard to optimize with different load conditions, (my rookie opinion) to expect any RDBMS app to have a clue what to do with it. I guess what I'm saying is the RDMBS app will know what blocks it needs, and wants to get them in and out speedy quick, but the mapping to disk is not linear with ZFS, the way other file systems are. An offload engine could translate this instead. Just throwing this out there for the purpose of blue sky fluff. Jon Anton B. Rang wrote: 5) DMA straight from user buffer to disk avoiding a copy. This is what the direct in direct i/o has historically meant. :-) line has been that 5) won't help latency much and latency is here I think the game is currently played. Now the disconnect might be because people might feel that the game is not latency but CPU efficiency : how many CPU cycles do I burn to do get data from disk to user buffer. Actually, it's less CPU cycles in many cases than memory cycles. For many databases, most of the I/O is writes (reads wind up cached in memory). What's the cost of a write? With direct I/O: CPU writes to memory (spread out over many transactions), disk DMAs from memory. We write LPS (log page size) bytes of data from CPU to memory, we read LPS bytes from memory. On processors without a cache line zero, we probably read the LPS data from memory as part of the write. Total cost = W:LPS, R:2*LPS. Without direct I/O: The cost of getting the data into the user buffer remains the same (W:LPS, R:LPS). We copy the data from user buffer to system buffer (W:LPS, R:LPS). Then we push it out to disk. Total cost = W:2*LPS, R:3*LPS. We've nearly doubled the cost, not including any TLB effects. On a memory-bandwidth-starved system (which should be nearly all modern designs, especially with multi-threaded chips like Niagara), replacing buffered I/O with direct I/O should give you nearly a 2x improvement in log write bandwidth. That's without considering cache effects (which shouldn't be too significant, really, since LPS should be the size of L2). How significant is this? We'd have to measure; and it will likely vary quite a lot depending on which database is used for testing. But note that, for ZFS, the win with direct I/O will be somewhat less. That's because you still need to read the page to compute its checksum. So for direct I/O with ZFS (with checksums enabled), the cost is W:LPS, R:2*LPS. Is saving one page of writes enough to make a difference? Possibly not. Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- - _/ _/ / - Jonathan Loran - - -/ / /IT Manager - - _ / _ / / Space Sciences Laboratory, UC Berkeley -/ / / (510) 643-5146 [EMAIL PROTECTED] - __/__/__/ AST:7731^29u18e3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Rayson Ho writes: 1) Modern DBMSs cache database pages in their own buffer pool because it is less expensive than to access data from the OS. (IIRC, MySQL's MyISAM is the only one that relies on the FS cache, but a lot of MySQL sites use INNODB which has its own buffer pool) The DB can and should cache data whether or not directio is used. 2) Also, direct I/O is faster because it avoid double buffering. A piece of data can be in one buffer, 2 buffers, 3 buffers. That says nothing about performance. More below. So I guess you mean DIO is faster because it avoids the extra copy: dma straight to user buffer rather than DMA to kernel buffer then copy to user buffer. If an I/O is 5ms an 8K copy is about 10 usec. Is avoiding the copy really the most urgent thing to work on ? Rayson On 10/2/07, eric kustarz [EMAIL PROTECTED] wrote: Not yet, see: 6429855 Need way to tell ZFS that caching is a lost cause Is there a specific reason why you need to do the caching at the DB level instead of the file system? I'm really curious as i've got conflicting data on why people do this. If i get more data on real reasons on why we shouldn't cache at the file system, then this could get bumped up in my priority queue. I can't answer this although can well imagine that the DB is the most efficent place to cache it's own data all organised and formatted to respond to queries. But once the DB has signified to the FS that it doesn't require the FS to cache data then the benefit from this RFE is that the memory used to stage the data can be quickly recycled by ZFS for subsequent operations. It means ZFS memory footprint is more likely to containuseful ZFS metadata and not cached data block we know are not likely to be used again anytime soon. We also would operated better in mixed DIO/non-DIO workloads. See also: http://blogs.sun.com/roch/entry/zfs_and_directio -r eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Matty writes: On 10/3/07, Roch - PAE [EMAIL PROTECTED] wrote: Rayson Ho writes: 1) Modern DBMSs cache database pages in their own buffer pool because it is less expensive than to access data from the OS. (IIRC, MySQL's MyISAM is the only one that relies on the FS cache, but a lot of MySQL sites use INNODB which has its own buffer pool) The DB can and should cache data whether or not directio is used. It does, which leads to the core problem. Why do we have to store the exact same data twice in memory (i.e., once in the ARC, and once in the shared memory segment that Oracle uses)? We do not retain 2 copies of the same data. If the DB cache is made large enough to consume most of memory, the ZFS copy will quickly be evicted to stage other I/Os on their way to the DB cache. What problem does that pose ? -r Thanks, - Ryan -- UNIX Administrator http://prefetch.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On Wed, Oct 03, 2007 at 10:42:53AM +0200, Roch - PAE wrote: Rayson Ho writes: 2) Also, direct I/O is faster because it avoid double buffering. A piece of data can be in one buffer, 2 buffers, 3 buffers. That says nothing about performance. More below. So I guess you mean DIO is faster because it avoids the extra copy: dma straight to user buffer rather than DMA to kernel buffer then copy to user buffer. If an I/O is 5ms an 8K copy is about 10 usec. Is avoiding the copy really the most urgent thing to work on ? If the DB is huge relative to RAM, and very busy, then memory pressure could become a problem. And it's not just the time spent copying buffers, but the resources spent managing those copies. (Just guessing.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On 10/3/07, Roch - PAE [EMAIL PROTECTED] wrote: We do not retain 2 copies of the same data. If the DB cache is made large enough to consume most of memory, the ZFS copy will quickly be evicted to stage other I/Os on their way to the DB cache. What problem does that pose ? Hi Roch, 1) The memory copy operations are expensive... I think the following is a good intro to this problem: Copying data in memory can be a serious bottleneck in DBMS software today. This fact is often a surprise to database students, who assume that main-memory operations are free compared to disk I/O. But in practice, a welltuned database installation is typically not I/O-bound. (section 3.2) http://mitpress.mit.edu/books/chapters/0262693143chapm2.pdf (Ch 2: Anatomy of a Database System, Readings in Database Systems, 4th Ed) 2) If you look at the TPC-C disclosure reports, you will see vendors using thousands of disks for the top 10 systems. With that many disks working in parallel, the I/O latencies are not as big as of a problem as systems with fewer disks. 3) Also interesting is Concurrent I/O, which was introduced in AIX 5.2: Improving Database Performance With AIX Concurrent I/O http://www-03.ibm.com/systems/p/os/aix/whitepapers/db_perf_aix.html Improve database performance on file system containers in IBM DB2 UDB V8.2 using Concurrent I/O on AIX http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0408lee/ Rayson -r Thanks, - Ryan -- UNIX Administrator http://prefetch.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On Oct 3, 2007, at 10:31 AM, Roch - PAE wrote: If the DB cache is made large enough to consume most of memory, the ZFS copy will quickly be evicted to stage other I/Os on their way to the DB cache. What problem does that pose ? Personally, I'm still not completely sold on the performance (performance as in ability, not speed) of ARC eviction. Often times, especially during a resilver, a server with ~2GB of RAM free under normal circumstances will dive down to the minfree floor, causing processes to be swapped out. We've had to take to manually constraining ARC max size so this situation is avoided. This is on s10u2/3. I haven't tried anything heavy duty with Nevada simply because I don't put Nevada in production situations. Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm surprised that this is being met with skepticism considering that Oracle highly recommends direct IO be used, and, IIRC, Oracle performance was the main motivation to adding DIO to UFS back in Solaris 2.6. This isn't a problem with ZFS or any specific fs per se, it's the buffer caching they all employ. So I'm a big fan of seeing 6429855 come to fruition. /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Hey Roch - We do not retain 2 copies of the same data. If the DB cache is made large enough to consume most of memory, the ZFS copy will quickly be evicted to stage other I/Os on their way to the DB cache. What problem does that pose ? Can't answer that question empirically, because we can't measure this, but I imagine there's some overhead to ZFS cache management in evicting and replacing blocks, and that overhead could be eliminated if ZFS could be told not to cache the blocks at all. Now, obviously, whether this overhead would be in the noise level, or something that actually hurts sustainable performance will depend on several things, but I can envision scenerios where it's overhead I'd rather avoid if I could. Thanks, /jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Rayson Ho wrote: On 10/3/07, Roch - PAE [EMAIL PROTECTED] wrote: We do not retain 2 copies of the same data. If the DB cache is made large enough to consume most of memory, the ZFS copy will quickly be evicted to stage other I/Os on their way to the DB cache. What problem does that pose ? Hi Roch, 1) The memory copy operations are expensive... I think the following is a good intro to this problem: Copying data in memory can be a serious bottleneck in DBMS software today. This fact is often a surprise to database students, who assume that main-memory operations are free compared to disk I/O. But in practice, a welltuned database installation is typically not I/O-bound. (section 3.2) ... just the ones people are complaining about ;-) Indeed it seems rare that a DB performance escalation does not involve I/O tuning :-( http://mitpress.mit.edu/books/chapters/0262693143chapm2.pdf (Ch 2: Anatomy of a Database System, Readings in Database Systems, 4th Ed) 2) If you look at the TPC-C disclosure reports, you will see vendors using thousands of disks for the top 10 systems. With that many disks working in parallel, the I/O latencies are not as big as of a problem as systems with fewer disks. 3) Also interesting is Concurrent I/O, which was introduced in AIX 5.2: Improving Database Performance With AIX Concurrent I/O http://www-03.ibm.com/systems/p/os/aix/whitepapers/db_perf_aix.html This is a pretty decent paper and some of the issues are the same with UFS. To wit, direct I/O is not always a win (qv. Bob Sneed's blog) It also describes what we call the single writer lock problem, which IBM solves with Concurrent I/O. See also: http://www.solarisinternals.com/wiki/index.php/Direct_I/O ZFS doesn't have the single writer lock problem. See also: http://blogs.sun.com/roch/entry/zfs_to_ufs_performance_comparison Slightly off-topic, in looking at some field data this morning (looking for something completely unrelated) I notice that the use of directio on UFS is declining over time. I'm not sure what that means... hopefully not more performance escalations... -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On Oct 3, 2007, at 5:21 PM, Richard Elling wrote: Slightly off-topic, in looking at some field data this morning (looking for something completely unrelated) I notice that the use of directio on UFS is declining over time. I'm not sure what that means... hopefully not more performance escalations... Sounds like someone from ZFS team needs to get with someone from Oracle/MySQL/Postgres and get the skinny on how the IO rubber-road boundary should look, because it doesn't sound like there's a definitive or at least a sure answer here. Oracle trumpets the use of DIO, and there are benchmarks and first- hand accounts out there from DBAs on its virtues - at least when running on UFS (and EXT2/3 on Linux, etc) As it relates to ZFS mechanics specifically, there doesn't appear to be any settled opinion. /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Hi Dale, We're testing out the enhanced arc_max enforcement (track DNLC entries) using Build 72 right now. Hopefully, it will fix the memory creep, which is the only real downside to ZFS for DB work it seems to me. Frankly, of our DB loads have improved performance with ZFS. I suspect its because we are write-heavy. -J On 10/3/07, Dale Ghent [EMAIL PROTECTED] wrote: On Oct 3, 2007, at 10:31 AM, Roch - PAE wrote: If the DB cache is made large enough to consume most of memory, the ZFS copy will quickly be evicted to stage other I/Os on their way to the DB cache. What problem does that pose ? Personally, I'm still not completely sold on the performance (performance as in ability, not speed) of ARC eviction. Often times, especially during a resilver, a server with ~2GB of RAM free under normal circumstances will dive down to the minfree floor, causing processes to be swapped out. We've had to take to manually constraining ARC max size so this situation is avoided. This is on s10u2/3. I haven't tried anything heavy duty with Nevada simply because I don't put Nevada in production situations. Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm surprised that this is being met with skepticism considering that Oracle highly recommends direct IO be used, and, IIRC, Oracle performance was the main motivation to adding DIO to UFS back in Solaris 2.6. This isn't a problem with ZFS or any specific fs per se, it's the buffer caching they all employ. So I'm a big fan of seeing 6429855 come to fruition. /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Postgres assumes that the OS takes care of caching: PLEASE NOTE. PostgreSQL counts a lot on the OS to cache data files and hence does not bother with duplicating its file caching effort. The shared buffers parameter assumes that OS is going to cache a lot of files and hence it is generally very low compared with system RAM. Even for a dataset in excess of 20GB, a setting of 128MB may be too much, if you have only 1GB RAM and an aggressive-at-caching OS like Linux. Tuning PostgreSQL for performance, Shridhar Daithankar, Josh Berkus, 2003, http://www.varlena.com/GeneralBits/Tidbits/perf.html Slightly off-topic, I have noticed at least 25% performance gain on my postgresql database after installing Wu Fengguang's adaptive read- ahead disk cache patch for the linux kernel. http://lkml.org/lkml/ 2005/9/15/185 http://www.samag.com/documents/s=10101/sam0616a/0616a.htm I was wondering if Solaris uses a similar approach. On 04/10/2007, at 4:44 AM, Dale Ghent wrote: On Oct 3, 2007, at 5:21 PM, Richard Elling wrote: Slightly off-topic, in looking at some field data this morning (looking for something completely unrelated) I notice that the use of directio on UFS is declining over time. I'm not sure what that means... hopefully not more performance escalations... Sounds like someone from ZFS team needs to get with someone from Oracle/MySQL/Postgres and get the skinny on how the IO rubber-road boundary should look, because it doesn't sound like there's a definitive or at least a sure answer here. Oracle trumpets the use of DIO, and there are benchmarks and first- hand accounts out there from DBAs on its virtues - at least when running on UFS (and EXT2/3 on Linux, etc) As it relates to ZFS mechanics specifically, there doesn't appear to be any settled opinion. /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm surprised that this is being met with skepticism considering that Oracle highly recommends direct IO be used, and, IIRC, Oracle performance was the main motivation to adding DIO to UFS back in Solaris 2.6. This isn't a problem with ZFS or any specific fs per se, it's the buffer caching they all employ. So I'm a big fan of seeing 6429855 come to fruition. The point is that directI/O typically means two things: 1) concurrent I/O 2) no caching at the file system Most file systems (ufs, vxfs, etc.) don't do 1) or 2) without turning on directI/O. ZFS *does* 1. It doesn't do 2 (currently). That is what we're trying to discuss here. Where does the win come from with directI/O? Is it 1), 2), or some combination? If its a combination, what's the percentage of each towards the win? We need to tease 1) and 2) apart to have a full understanding. I'm not against adding 2) to ZFS but want more information. I suppose i'll just prototype it and find out for myself. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On Oct 3, 2007, at 3:44 PM, Dale Ghent wrote: On Oct 3, 2007, at 5:21 PM, Richard Elling wrote: Slightly off-topic, in looking at some field data this morning (looking for something completely unrelated) I notice that the use of directio on UFS is declining over time. I'm not sure what that means... hopefully not more performance escalations... Sounds like someone from ZFS team needs to get with someone from Oracle/MySQL/Postgres and get the skinny on how the IO rubber-road boundary should look, because it doesn't sound like there's a definitive or at least a sure answer here. I've done that already (Oracle, Postgres, JavaDB, etc.). Because the holy grail of directI/O is an overloaded term, we don't really know where the win within directI/O lies. In any event, it seems the only way to get a definitive answer here is to prototype a no caching property... eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Direct I/O ability with zfs?
We are using MySQL, and love the idea of using zfs for this. We are used to using Direct I/O to bypass file system caching (let the DB do this). Does this exist for zfs? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
1) Modern DBMSs cache database pages in their own buffer pool because it is less expensive than to access data from the OS. (IIRC, MySQL's MyISAM is the only one that relies on the FS cache, but a lot of MySQL sites use INNODB which has its own buffer pool) 2) Also, direct I/O is faster because it avoid double buffering. Rayson On 10/2/07, eric kustarz [EMAIL PROTECTED] wrote: Not yet, see: 6429855 Need way to tell ZFS that caching is a lost cause Is there a specific reason why you need to do the caching at the DB level instead of the file system? I'm really curious as i've got conflicting data on why people do this. If i get more data on real reasons on why we shouldn't cache at the file system, then this could get bumped up in my priority queue. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
David Runyon wrote: We are using MySQL, and love the idea of using zfs for this. We are used to using Direct I/O to bypass file system caching (let the DB do this). Does this exist for zfs? This is a FAQ. See: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Database_Recommendations http://blogs.sun.com/roch/entry/zfs_and_directio http://blogs.sun.com/bobs/entry/one_i_o_two_i -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Direct I/O ability with zfs?
On Tue, Oct 02, 2007 at 01:20:24PM -0600, eric kustarz wrote: On Oct 2, 2007, at 1:11 PM, David Runyon wrote: We are using MySQL, and love the idea of using zfs for this. We are used to using Direct I/O to bypass file system caching (let the DB do this). Does this exist for zfs? Not yet, see: 6429855 Need way to tell ZFS that caching is a lost cause Is there a specific reason why you need to do the caching at the DB level instead of the file system? I'm really curious as i've got conflicting data on why people do this. If i get more data on real reasons on why we shouldn't cache at the file system, then this could get bumped up in my priority queue. At least two reasons: http://developers.sun.com/solaris/articles/mysql_perf_tune.html#6 http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle (the first example proofs that this issue is not only Oracle-related) Regards przemol -- http://przemol.blogspot.com/ -- Fajne i smieszne. Zobacz najlepsze filmiki! http://link.interia.pl/f1bbb ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss