Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-29 Thread Rayson Ho
Restarting this thread... I've just finished reading the article, A
look at MySQL on ZFS:
http://dev.mysql.com/tech-resources/articles/mysql-zfs.html

The section  MySQL Performance Comparison: ZFS vs. UFS on Open
Solaris looks interesting...

Rayson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-29 Thread Rayson Ho
Hi Tony,

John posted the URL to his article to the databases-discuss this
morning, and I only took a very quick look.

May be you can join that list and discuss further regarding the configurations?
http://mail.opensolaris.org/mailman/listinfo/databases-discuss

Rayson



On 10/29/07, Tony Leone [EMAIL PROTECTED] wrote:
 This is very interesting because it directly contradicts the results the ZFS 
 developers are posting on the OpenSolaris mailing list.  I just scanned the 
 article, does he give his ZFS settings and is he separate ZIL devices?

 Tony Leone

  Rayson Ho [EMAIL PROTECTED] 10/29/2007 11:39 AM 
 Restarting this thread... I've just finished reading the article, A
 look at MySQL on ZFS:
 http://dev.mysql.com/tech-resources/articles/mysql-zfs.html

 The section  MySQL Performance Comparison: ZFS vs. UFS on Open
 Solaris looks interesting...

 Rayson
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-10 Thread dudekula mastan
Hi All,
   
  Any update on this ?
   
  -Masthan D

dudekula mastan [EMAIL PROTECTED] wrote:
Hi Everybody,
   
  From the last one week so many mails are exchanged on this topic.
   
  I have also one similar issue like this. I will appreciate if any one helps 
me on this.
   
  I have an IO test tool, which writes the data and reads the data and then 
compare the read data with write data. If read data and write data are same 
then there is no CORRUIPTION else there is a CORRUPTION.
   
  File data may corrupt because of any reasons and one possible reason is file 
system cache. If file system cache have issues, it will give wrong data (wrong 
data means the actual data on the disk and the data that read call return to 
the application are not match) to user applications.
   
  When there is a CORRUPTION, to check file system cache issues, my application 
bypass the file system cache and then reads (Re-read) the data from the same 
file and then compare the re-read data with write data.
   
  Tell me, is there a way to skip ZFS file system cache or tell me is there a 
way to do direct IO on ZFS file system?
   
  Regards
  Masthan D
   

-
  Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


   
-
Pinpoint customers who are looking for what you sell. ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-10 Thread Vidya Sakar N
 Tell me, is there a way to skip ZFS file system cache or tell me is
 there a way to do direct IO on ZFS file system?

No, currently there is no way to disable file system cache aka ARC in ZFS.
There is a pending RFE though,
6429855 Need way to tell ZFS that caching is a lost cause

Cheers,
Vidya Sakar


dudekula mastan wrote:
 Hi All,
  
 Any update on this ?
  
 -Masthan D
 
 */dudekula mastan [EMAIL PROTECTED]/* wrote:
 
 Hi Everybody,
  
  From the last one week so many mails are exchanged on this topic.
  
 I have also one similar issue like this. I will appreciate if any
 one helps me on this.
  
 I have an IO test tool, which writes the data and reads the data and
 then compare the read data with write data. If read data and write
 data are same then there is no CORRUIPTION else there is a CORRUPTION.
  
 File data may corrupt because of any reasons and one possible reason
 is file system cache. If file system cache have issues, it will
 give wrong data (wrong data means the actual data on the disk and
 the data that read call return to the application are not match) to
 user applications.
  
 When there is a CORRUPTION, to check file system cache issues, my
 application bypass the file system cache and then reads (Re-read)
 the data from the same file and then compare the re-read data with
 write data.
  
 Tell me, is there a way to skip ZFS file system cache or tell me is
 there a way to do direct IO on ZFS file system?
  
 Regards
 Masthan D
  
 
 Don't let your dream ride pass you by. Make it a reality
 
 http://us.rd.yahoo.com/evt=51200/*http://autos.yahoo.com/index.html;_ylc=X3oDMTFibjNlcHF0BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDYXV0b3MtZHJlYW1jYXI-
 with Yahoo! Autos. ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 
 Pinpoint customers 
 http://us.rd.yahoo.com/evt=48250/*http://searchmarketing.yahoo.com/arp/sponsoredsearch_v9.php?o=US2226cmp=Yahooctv=AprNIs=Ys2=EMb=50who
  
 are looking for what you sell.
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-09 Thread dudekula mastan
Hi Everybody,
   
  From the last one week so many mails are exchanged on this topic.
   
  I have also one similar issue like this. I will appreciate if any one helps 
me on this.
   
  I have an IO test tool, which writes the data and reads the data and then 
compare the read data with write data. If read data and write data are same 
then there is no CORRUIPTION else there is a CORRUPTION.
   
  File data may corrupt because of any reasons and one possible reason is file 
system cache. If file system cache have issues, it will give wrong data (wrong 
data means the actual data on the disk and the data that read call return to 
the application are not match) to user applications.
   
  When there is a CORRUPTION, to check file system cache issues, my application 
bypass the file system cache and then reads (Re-read) the data from the same 
file and then compare the re-read data with write data.
   
  Tell me, is there a way to skip ZFS file system cache or tell me is there a 
way to do direct IO on ZFS file system?
   
  Regards
  Masthan D
   

   
-
Don't let your dream ride pass you by.Make it a reality with Yahoo! Autos. ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-06 Thread Richard Elling
Peter Schuller wrote:
 Is there a specific reason why you need to do the caching at the DB  
 level instead of the file system?  I'm really curious as i've got  
 conflicting data on why people do this.  If i get more data on real  
 reasons on why we shouldn't cache at the file system, then this could  
 get bumped up in my priority queue.
 
 FWIW a MySQL database was recently moved to a FreeBSD system with
 ZFS. Performance ended up sucking because for some reason data did not
 make it into the cache in a predictable fashion (simple case of
 repeated queries were not cached; so for example a very common query,
 even when executed repeatedly on an idle system, would take more than
 1 minute instead of 0.10 seconds or so when cached).
 
 Ended up convincing the person running the DB to switch from MyISAM
 (which does not seem to support DB level caching, other than of
 indexes) to InnoDB, thus allowing use of the InnoDB buffer cache.
 
 I don't know why it wasn't cached by ZFS/ARC to begin with (the size
 of the ARC cache was definitely large enough - ~ 800 MB, and I know
 the working set for this query was below 300 MB). Perhaps it has to do
 with ARC trying to be smart and avoiding flushing the cache with
 useless data? I am not read up on the details of the ARC. But in this
 particular case it was clear that a simple LRU had been much more
 useful - unless there was some other problem related to my setup or
 FreeBSD integration that somehow broke proper caching.

Neel's arcstat might help shed light on such behaviour.
http://blogs.sun.com/realneel/entry/zfs_arc_statistics

  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-05 Thread Dave Johnson
From: Anton B. Rang [EMAIL PROTECTED]
 For many databases, most of the I/O is writes (reads wind up
 cached in memory).

2 words:  table scan

-=dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-05 Thread Nicolas Williams
On Fri, Oct 05, 2007 at 08:56:26AM -0700, Tim Spriggs wrote:
 Time for on board FPGAs!

Heh!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-05 Thread Jonathan Loran



Nicolas Williams wrote:

On Thu, Oct 04, 2007 at 10:26:24PM -0700, Jonathan Loran wrote:
  
I can envision a highly optimized, pipelined system, where writes and 
reads pass through checksum, compression, encryption ASICs, that also 
locate data properly on disk.  ...



I've argued before that RAID-Z could be implemented in hardware.  But I
think that it's all about economics.  Software is easier to develop and
patch than hardware, so if we can put together systems with enough
memory, general purpose CPU horsepower, and memory and I/O bandwidth,
all cheaply enough, then that will be better than developing special
purpose hardware for ZFS.  Thumper is an example of such a system.

Eventually we may find trends in system design once again favoring
pushing special tasks to the edge.  When that happens I'm sure we'll go
there.  But right now the trend is to put crypto co-processors and NICs
on the same die as the CPU.

Nico
  
1) We can put it on the same die also, or at least as a chip set on the 
MoBo.


2) Offload engines do have software, stored in firmware.   Or maybe such 
an offload processor could run software out of a driver, could be booted 
in dynamically?


3) You all are aware of how many micro processors are involved in a 
normal file server right?  There's one at almost every interface, disk 
to controller, controller to PCI bridge, PCI bridge to Hyperbus, etc.  
Imagine the burden if you did all that in the CPU only?  I sometimes 
find it amazing computers are as stable as they are, but it's all in the 
maturity of the code running in every step of the way, and of course, 
good firmware coding practices.  Your vanilla SCSI controllers and disk 
drives do a lot of very complex but useful processing.  We trust these 
guys 100%, because the interface is stable, and the code and processors 
are mature, well used.


I do agree, pushing ZFS to the edge will come down the road, when it 
becomes less dynamic (how boring) and we know more about the bottlenecks.


Jon

--


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-05 Thread Peter Schuller
 Is there a specific reason why you need to do the caching at the DB  
 level instead of the file system?  I'm really curious as i've got  
 conflicting data on why people do this.  If i get more data on real  
 reasons on why we shouldn't cache at the file system, then this could  
 get bumped up in my priority queue.

FWIW a MySQL database was recently moved to a FreeBSD system with
ZFS. Performance ended up sucking because for some reason data did not
make it into the cache in a predictable fashion (simple case of
repeated queries were not cached; so for example a very common query,
even when executed repeatedly on an idle system, would take more than
1 minute instead of 0.10 seconds or so when cached).

Ended up convincing the person running the DB to switch from MyISAM
(which does not seem to support DB level caching, other than of
indexes) to InnoDB, thus allowing use of the InnoDB buffer cache.

I don't know why it wasn't cached by ZFS/ARC to begin with (the size
of the ARC cache was definitely large enough - ~ 800 MB, and I know
the working set for this query was below 300 MB). Perhaps it has to do
with ARC trying to be smart and avoiding flushing the cache with
useless data? I am not read up on the details of the ARC. But in this
particular case it was clear that a simple LRU had been much more
useful - unless there was some other problem related to my setup or
FreeBSD integration that somehow broke proper caching.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org



pgpmGHbRPWivC.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-05 Thread johansen
 But note that, for ZFS, the win with direct I/O will be somewhat
 less.  That's because you still need to read the page to compute
 its checksum.  So for direct I/O with ZFS (with checksums enabled),
 the cost is W:LPS, R:2*LPS.  Is saving one page of writes enough to
 make a difference?  Possibly not.

It's more complicated than that.  The kernel would be verifying
checksums on buffers in a user's address space.  For this to work, we
have to map these buffers into the kernel and simultaneously arrange for
these pages to be protected from other threads in the user's address
space.  We discussed some of the VM gymnastics required to properly
implement this back in January:

http://mail.opensolaris.org/pipermail/zfs-discuss/2007-January/thread.html#36890

-j

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Louwtjie Burger
Would it be easier to ...

1) Change ZFS code to enable a sort of directIO emulation and then run
various tests... or

2) Use Sun's performance team, which have all the experience in the
world when it comes to performing benchmarks on Solaris and Oracle ..
+ a Dtrace master to drill down and see what the difference is between
UFS and UFS/DIO... and where the real win lies.


On 10/4/07, eric kustarz [EMAIL PROTECTED] wrote:

 On Oct 3, 2007, at 3:44 PM, Dale Ghent wrote:

  On Oct 3, 2007, at 5:21 PM, Richard Elling wrote:
 
  Slightly off-topic, in looking at some field data this morning
  (looking
  for something completely unrelated) I notice that the use of directio
  on UFS is declining over time.  I'm not sure what that means...
  hopefully
  not more performance escalations...
 
  Sounds like someone from ZFS team needs to get with someone from
  Oracle/MySQL/Postgres and get the skinny on how the IO rubber-road
  boundary should look, because it doesn't sound like there's a
  definitive or at least a sure answer here.

 I've done that already (Oracle, Postgres, JavaDB, etc.).  Because the
 holy grail of directI/O is an overloaded term, we don't really know
 where the win within directI/O lies.  In any event, it seems the
 only way to get a definitive answer here is to prototype a no caching
 property...

 eric
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Jim Mauro

 Where does the win come from with directI/O?  Is it 1), 2), or some  
 combination?  If its a combination, what's the percentage of each  
 towards the win?
   
That will vary based on workload (I know, you already knew that ... :^).
Decomposing the performance win between what is gained as a result of 
single writer
lock breakup and no caching is something we can only guess at, because, 
at least
for UFS, you can't do just one - it's all or nothing.
 We need to tease 1) and 2) apart to have a full understanding.  

We can't. We can only guess (for UFS).

My opinion - it's a must-have for ZFS if we're going to get serious 
attention
in the database space. I'll bet dollars-to-donuts that, over the next 
several years,
we'll burn many tens-of-millions of dollars on customer support 
escalations that
come down to memory utilization issues and contention between database
specific buffering and the ARC. This is entirely my opinion (not that of 
Sun),
and I've been wrong before.

Thanks,
/jim



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Roch - PAE
Jim Mauro writes:
  
   Where does the win come from with directI/O?  Is it 1), 2), or some  
   combination?  If its a combination, what's the percentage of each  
   towards the win?
 
  That will vary based on workload (I know, you already knew that ... :^).
  Decomposing the performance win between what is gained as a result of 
  single writer
  lock breakup and no caching is something we can only guess at, because, 
  at least
  for UFS, you can't do just one - it's all or nothing.
   We need to tease 1) and 2) apart to have a full understanding.  
  
  We can't. We can only guess (for UFS).
  
  My opinion - it's a must-have for ZFS if we're going to get serious 
  attention
  in the database space. I'll bet dollars-to-donuts that, over the next 
  several years,
  we'll burn many tens-of-millions of dollars on customer support 
  escalations that
  come down to memory utilization issues and contention between database
  specific buffering and the ARC. This is entirely my opinion (not that of 
  Sun),


...memory utilisation... OK so we should implement the 'lost cause' rfe.

In all cases, ZFS must not steal pages from other memory consumers :

6488341 ZFS should avoiding growing the ARC into trouble

So the DB memory pages should not be _contented_ for. 

-r

  and I've been wrong before.
  
  Thanks,
  /jim
  
  
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Roch - PAE

eric kustarz writes:
  
   Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm
   surprised that this is being met with skepticism considering that
   Oracle highly recommends direct IO be used,  and, IIRC, Oracle
   performance was the main motivation to adding DIO to UFS back in
   Solaris 2.6. This isn't a problem with ZFS or any specific fs per se,
   it's the buffer caching they all employ. So I'm a big fan of seeing
   6429855 come to fruition.
  
  The point is that directI/O typically means two things:
  1) concurrent I/O
  2) no caching at the file system
  

In my blog I also mention :

   3) no readahead (but can be viewed as an implicit consequence of 2)

And someone chimed in with

   4) ability to do I/O at the sector granularity.


I also think that for many 2) is too weak form of what they
expect :

   5) DMA straight from user buffer to disk avoiding a copy.


So
 
   1) concurrent I/O we have in ZFS.

   2) No Caching.
  we could do by taking a directio hint and evict 
  arc buffer immediately after copyout to user space
  for reads,  and after txg completion for writes.

   3) No prefetching.
  we have 2 level of prefetching. The low level was
  fixed recently. Should not cause problem to DB loads.
  The high level still needs fixing on it's own.
  Then we should take the same hint as 2) to disable it
  altogether. In the mean time we can tune our way into 
  this mode.

   4) Sector sized I/O
  Is really foreign to ZFS design.

   5) Zero Copy  more CPU efficientcy.
  I think is where the debate is.



My line has been that 5) won't help latency much and latency is
where I think the game is currently played. Now the
disconnect might be because people might feel that the game
is not latency but CPU efficientcy : how many CPU cycles to I
burn to do get data from disk to user buffer. This is a
valid point. Configurations can with very large number
of disks end up saturated by the filesystem CPU utilisation.

So I still think that the major area  for ZFS perf gains are
on the latency  front : block  allocation (now much improved
with  the Separate  intent log),  I/O  scheduling, and other
fixes to the threading  ARC behavior.  But at some point we
can turn  our microscope onthe CPU efficientcy  of   the
implementation.   The copy will certainly be  a big chunk of
the CPU cost per  I/O but I would still  like to gather that
data.

Also  consider, 50  disks at  200 IOPS of   8K is 80 MB/sec.
That means maybe  1/10th  of a single  CPU  to  be saved  by
avoiding just   the copy. Probably  not  what people have in
mind.  How many CPU's do you have when attaching 1000 drives 
to a host running a 100TB database ? That many drivers will barely 
occupy 2 cores running the copies.

People want  performance and efficientcy. Directio is
just an overloaded name that  delivered those gains to other
filesystems.

Right now, what I think  is worth gathering is cycles  spent
in ZFS per reads  writes in a large DB environment where DB
holds  90%  of memory.  For  comparison with another  FS, we
should disable checksum, file prefetching, vdev prefetching,
cap the  ARC, atime  off,  8K  recordsize.  A breakdown  and
comparison  of   the  CPU  cost per   layer   will  be quite
interesting and points to what needs work.

Another interesting thing for me would be : what is your
budget ?

how   many cycles per DB   reads and writes are you
willing to spend and how did you come to that number


But, as Eric says, let's develop 2 and I'll try  in parallel to 
figure out the per layer breakdown cost.

-r



  Most file systems (ufs, vxfs, etc.) don't do 1) or 2) without turning  
  on directI/O.
  
  ZFS *does* 1.  It doesn't do 2 (currently).
  
  That is what we're trying to discuss here.
  
  Where does the win come from with directI/O?  Is it 1), 2), or some  
  combination?  If its a combination, what's the percentage of each  
  towards the win?
  
  We need to tease 1) and 2) apart to have a full understanding.  I'm  
  not against adding 2) to ZFS but want more information.  I suppose  
  i'll just prototype it and find out for myself.
  
  eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Nicolas Williams
On Wed, Oct 03, 2007 at 04:31:01PM +0200, Roch - PAE wrote:
   It does, which leads to the core problem. Why do we have to store the
   exact same data twice in memory (i.e., once in the ARC, and once in
   the shared memory segment that Oracle uses)? 
 
 We do not retain 2 copies of the same data.
 
 If the DB cache is made large enough to consume most of memory,
 the ZFS copy will quickly be evicted to stage other I/Os on
 their way to the DB cache.
 
 What problem does that pose ?

Other things deserving of staying in the cache get pushed out by things
that don't deserve being in the cache.  Thus systemic memory pressure
(e.g., more on-demand paging of text).

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Nicolas Williams
On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote:
 ...memory utilisation... OK so we should implement the 'lost cause' rfe.
 
 In all cases, ZFS must not steal pages from other memory consumers :
 
   6488341 ZFS should avoiding growing the ARC into trouble
 
 So the DB memory pages should not be _contented_ for. 

What if your executable text, and pretty much everything lives on ZFS?
You don't want to content for the memory caching those things either.
It's not just the DB's memory you don't want to contend for.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Roch - PAE

Nicolas Williams writes:

  On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote:
   ...memory utilisation... OK so we should implement the 'lost cause' rfe.
   
   In all cases, ZFS must not steal pages from other memory consumers :
   
  6488341 ZFS should avoiding growing the ARC into trouble
   
   So the DB memory pages should not be _contented_ for. 
  
  What if your executable text, and pretty much everything lives on ZFS?
  You don't want to content for the memory caching those things either.
  It's not just the DB's memory you don't want to contend for.

On the read side, 

We're talking  here  about  1000   disks each  running35
concurrent I/Os of 8K, so a footprint of 250MB, to stage a
ton of work.

On the write side  we do have  to play with  the transaction
group  so  that will be  5-10   seconds worth of synchronous
write activity.

But how much memory does a 1000-disks server got ?

-r




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Nicolas Williams
On Thu, Oct 04, 2007 at 06:59:56PM +0200, Roch - PAE wrote:
 Nicolas Williams writes:
   On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote:
So the DB memory pages should not be _contented_ for. 
   
   What if your executable text, and pretty much everything lives on ZFS?
   You don't want to content for the memory caching those things either.
   It's not just the DB's memory you don't want to contend for.
 
 On the read side, 
 
 We're talking  here  about  1000   disks each  running35
 concurrent I/Os of 8K, so a footprint of 250MB, to stage a
 ton of work.

I'm not sure what you mean, but extra copies and memory just to stage
the I/Os is not the same as the systemic memory pressure issue.

Now, I'm _speculating_ as to what the real problem is, but it seems very
likely that putting things in the cache that needn't be there would push
out things that should be there, and since restoring those things to the
cache later would cost I/Os...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Eric Hamilton
I'd like to second a couple of comments made recently:
* If they don't regularly do so, I too encourage the ZFS, Solaris 
performance, and Sun Oracle support teams to sit down and talk about the 
utility of Direct I/O for databases.
* I too suspect that absent Direct I/O (or some ringing endorsement 
from Oracle about how ZFS doesn't need Direct I/O), there will be a 
drain of customer escalations regarding the lack-- plus FUD and other 
sales inhibitors.

While I realize that Sun has not published a TPC-C result since 2001 and 
offers a different value proposition to customers, performance does 
matter and for some cases Direct I/O can contribute to that.

Historically, every TPC-C database benchmark run can be converted from 
being I/O bound to being CPU bound by adding enough disk spindles and 
enough main memory.  In that context, saving the CPU cycles (and cache 
misses) from a copy are important.

Another historical trend was that for performance, portability across 
different operating systems, and perhaps just because they could, 
databases tended to use as few OS capabilities as possible and to do 
their own resource management.  So for instance databases were often 
benchmarked using raw devices.  Customers on the other hand preferred 
the manageability of filesystems and tended to deploy there.  In that 
context, Direct I/O is an attempt to get the best of both worlds.

Finally, besides UFS Direct I/O on Solaris, other filesystems including 
VxFS also have various forms of Direct I/O-- either separate APIs or 
mount options for that bypass the cache on large writes, etc.  
Understanding those benefits, both real and advertised, helps understand 
the opportunities and shortfalls for ZFS.

It may be that this is not the most important thing for ZFS performance 
or capability right now-- measurement in  targeted configurations and 
workloads is the only way to tell-- but I'd be highly surprised if there 
isn't something (bypass cache on really large writes?) that can't be 
learned from experiences with Direct I/O.

Eric (Hamilton)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Roch - PAE

Nicolas Williams writes:
  On Wed, Oct 03, 2007 at 04:31:01PM +0200, Roch - PAE wrote:
 It does, which leads to the core problem. Why do we have to store the
 exact same data twice in memory (i.e., once in the ARC, and once in
 the shared memory segment that Oracle uses)? 
   
   We do not retain 2 copies of the same data.
   
   If the DB cache is made large enough to consume most of memory,
   the ZFS copy will quickly be evicted to stage other I/Os on
   their way to the DB cache.
   
   What problem does that pose ?
  
  Other things deserving of staying in the cache get pushed out by things
  that don't deserve being in the cache.  Thus systemic memory pressure
  (e.g., more on-demand paging of text).
  
  Nico
  -- 

I agree. That's why I submitted both of these.

6429855 Need way to tell ZFS that caching is a lost cause
6488341 ZFS should avoiding growing the ARC into trouble

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Anton B. Rang
 5) DMA straight from user buffer to disk avoiding a copy.

This is what the direct in direct i/o has historically meant.  :-)

 line has been that 5) won't help latency much and
 latency is here I think the game is currently played. Now the
 disconnect might be because people might feel that the game
 is not latency but CPU efficiency : how many CPU cycles do I
 burn to do get data from disk to user buffer.

Actually, it's less CPU cycles in many cases than memory cycles.

For many databases, most of the I/O is writes (reads wind up
cached in memory).  What's the cost of a write?

With direct I/O: CPU writes to memory (spread out over many
transactions), disk DMAs from memory.  We write LPS (log page size)
bytes of data from CPU to memory, we read LPS bytes from memory.
On processors without a cache line zero, we probably read the LPS
data from memory as part of the write.  Total cost = W:LPS, R:2*LPS.

Without direct I/O: The cost of getting the data into the user buffer
remains the same (W:LPS, R:LPS).  We copy the data from user buffer
to system buffer (W:LPS, R:LPS).  Then we push it out to disk.  Total
cost = W:2*LPS, R:3*LPS.  We've nearly doubled the cost, not including
any TLB effects.

On a memory-bandwidth-starved system (which should be nearly all
modern designs, especially with multi-threaded chips like Niagara),
replacing buffered I/O with direct I/O should give you nearly a 2x
improvement in log write bandwidth.  That's without considering
cache effects (which shouldn't be too significant, really, since LPS
should be  the size of L2).

How significant is this?  We'd have to measure; and it will likely
vary quite a lot depending on which database is used for testing.

But note that, for ZFS, the win with direct I/O will be somewhat
less.  That's because you still need to read the page to compute
its checksum.  So for direct I/O with ZFS (with checksums enabled),
the cost is W:LPS, R:2*LPS.  Is saving one page of writes enough to
make a difference?  Possibly not.

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Jonathan Loran


I've been thinking about this for awhile, but Anton's analysis makes me 
think about it even more:


We all love ZFS, right.  It's futuristic in a bold new way, which many 
virtues,  I won't preach tot he choir.  But to make it all glue together 
has some necessary CPU/Memory intensive operations around checksum 
generation/validation, compression, encryption, data placement/component 
load balancing, etc.  Processors have gotten really powerful, much more 
so than the relative disk I/O gains, which in all honesty make ZFS 
possible.  My question: Is anyone working on an offload engine for ZFS?  
I can envision a highly optimized, pipelined system, where writes and 
reads pass through checksum, compression, encryption ASICs, that also 
locate data properly on disk.  This could even be in the form of a PCIe 
SATA/SAS card with many ports, or different options.  This would make 
direct IO, or DMA IO possible again.  The file system abstraction with 
ZFS is really too much and too important to ignore, and too hard to 
optimize with different load conditions, (my rookie opinion) to expect 
any RDBMS app to have a clue what to do with it.  I guess what I'm 
saying is the RDMBS app will know what blocks it needs, and wants to get 
them in and out speedy quick, but the mapping to disk is not linear with 
ZFS, the way other file systems are.  An offload engine could translate 
this instead.


Just throwing this out there for the purpose of blue sky fluff.

Jon

Anton B. Rang wrote:

5) DMA straight from user buffer to disk avoiding a copy.



This is what the direct in direct i/o has historically meant.  :-)

  

line has been that 5) won't help latency much and
latency is here I think the game is currently played. Now the
disconnect might be because people might feel that the game
is not latency but CPU efficiency : how many CPU cycles do I
burn to do get data from disk to user buffer.



Actually, it's less CPU cycles in many cases than memory cycles.

For many databases, most of the I/O is writes (reads wind up
cached in memory).  What's the cost of a write?

With direct I/O: CPU writes to memory (spread out over many
transactions), disk DMAs from memory.  We write LPS (log page size)
bytes of data from CPU to memory, we read LPS bytes from memory.
On processors without a cache line zero, we probably read the LPS
data from memory as part of the write.  Total cost = W:LPS, R:2*LPS.

Without direct I/O: The cost of getting the data into the user buffer
remains the same (W:LPS, R:LPS).  We copy the data from user buffer
to system buffer (W:LPS, R:LPS).  Then we push it out to disk.  Total
cost = W:2*LPS, R:3*LPS.  We've nearly doubled the cost, not including
any TLB effects.

On a memory-bandwidth-starved system (which should be nearly all
modern designs, especially with multi-threaded chips like Niagara),
replacing buffered I/O with direct I/O should give you nearly a 2x
improvement in log write bandwidth.  That's without considering
cache effects (which shouldn't be too significant, really, since LPS
should be  the size of L2).

How significant is this?  We'd have to measure; and it will likely
vary quite a lot depending on which database is used for testing.

But note that, for ZFS, the win with direct I/O will be somewhat
less.  That's because you still need to read the page to compute
its checksum.  So for direct I/O with ZFS (with checksums enabled),
the cost is W:LPS, R:2*LPS.  Is saving one page of writes enough to
make a difference?  Possibly not.

Anton
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


--


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Roch - PAE
Rayson Ho writes:

  1) Modern DBMSs cache database pages in their own buffer pool because
  it is less expensive than to access data from the OS. (IIRC, MySQL's
  MyISAM is the only one that relies on the FS cache, but a lot of MySQL
  sites use INNODB which has its own buffer pool)
  

The DB can and should cache data whether or not directio is used.

  2) Also, direct I/O is faster because it avoid double buffering.
  

A piece of data can be in one buffer, 2 buffers, 3
buffers. That says nothing about performance. More below.

So I guess  you  mean DIO  is  faster because it  avoids the
extra copy: dma straight to  user buffer rather than DMA  to
kernel buffer then copy to user buffer. If an  I/O is 5ms an
8K copy is  about 10 usec. Is  avoiding the copy  really the
most urgent thing to work on ?



  Rayson
  
  
  
  
  On 10/2/07, eric kustarz [EMAIL PROTECTED] wrote:
   Not yet, see:
   6429855 Need way to tell ZFS that caching is a lost cause
  
   Is there a specific reason why you need to do the caching at the DB
   level instead of the file system?  I'm really curious as i've got
   conflicting data on why people do this.  If i get more data on real
   reasons on why we shouldn't cache at the file system, then this could
   get bumped up in my priority queue.
  

I can't answer this although can well imagine that the DB is 
the most efficent place to cache it's own data all organised 
and formatted to respond to queries. 

But  once the  DB has signified   to the FS that it  doesn't
require the FS to cache data  then the benefit from this RFE
is  that the memory used to   stage the data  can be quickly
recycled by ZFS  for  subsequent  operations. It  means  ZFS
memory footprint   is  more  likely to containuseful ZFS
metadata and not cached data block we know are not likely to
be used again anytime soon.

We also would operated better in mixed DIO/non-DIO workloads.


See also:
http://blogs.sun.com/roch/entry/zfs_and_directio

-r



   eric
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Roch - PAE

Matty writes:
  On 10/3/07, Roch - PAE [EMAIL PROTECTED] wrote:
   Rayson Ho writes:
  
 1) Modern DBMSs cache database pages in their own buffer pool because
 it is less expensive than to access data from the OS. (IIRC, MySQL's
 MyISAM is the only one that relies on the FS cache, but a lot of MySQL
 sites use INNODB which has its own buffer pool)

  
   The DB can and should cache data whether or not directio is used.
  
  It does, which leads to the core problem. Why do we have to store the
  exact same data twice in memory (i.e., once in the ARC, and once in
  the shared memory segment that Oracle uses)? 

We do not retain 2 copies of the same data.

If the DB cache is made large enough to consume most of memory,
the ZFS copy will quickly be evicted to stage other I/Os on
their way to the DB cache.

What problem does that pose ?

-r

  
  Thanks,
  - Ryan
  -- 
  UNIX Administrator
  http://prefetch.net
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Nicolas Williams
On Wed, Oct 03, 2007 at 10:42:53AM +0200, Roch - PAE wrote:
 Rayson Ho writes:
   2) Also, direct I/O is faster because it avoid double buffering.
 
 A piece of data can be in one buffer, 2 buffers, 3
 buffers. That says nothing about performance. More below.
 
 So I guess  you  mean DIO  is  faster because it  avoids the
 extra copy: dma straight to  user buffer rather than DMA  to
 kernel buffer then copy to user buffer. If an  I/O is 5ms an
 8K copy is  about 10 usec. Is  avoiding the copy  really the
 most urgent thing to work on ?

If the DB is huge relative to RAM, and very busy, then memory pressure
could become a problem.  And it's not just the time spent copying
buffers, but the resources spent managing those copies.  (Just guessing.)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Rayson Ho
On 10/3/07, Roch - PAE [EMAIL PROTECTED] wrote:
 We do not retain 2 copies of the same data.

 If the DB cache is made large enough to consume most of memory,
 the ZFS copy will quickly be evicted to stage other I/Os on
 their way to the DB cache.

 What problem does that pose ?

Hi Roch,

1) The memory copy operations are expensive... I think the following
is a good intro to this problem:

Copying data in memory can be a serious bottleneck in DBMS software
today. This fact is often a surprise to database students, who assume
that main-memory operations are free compared to disk I/O. But in
practice, a welltuned database installation is typically not
I/O-bound.  (section 3.2)

http://mitpress.mit.edu/books/chapters/0262693143chapm2.pdf

(Ch 2: Anatomy of a Database System, Readings in Database Systems, 4th Ed)


2) If you look at the TPC-C disclosure reports, you will see vendors
using thousands of disks for the top 10 systems. With that many disks
working in parallel, the I/O latencies are not as big as of a problem
as systems with fewer disks.


3) Also interesting is Concurrent I/O, which was introduced in AIX 5.2:

Improving Database Performance With AIX Concurrent I/O
http://www-03.ibm.com/systems/p/os/aix/whitepapers/db_perf_aix.html

Improve database performance on file system containers in IBM DB2 UDB
V8.2 using Concurrent I/O on AIX
http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0408lee/

Rayson




 -r

  
   Thanks,
   - Ryan
   --
   UNIX Administrator
   http://prefetch.net
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Dale Ghent
On Oct 3, 2007, at 10:31 AM, Roch - PAE wrote:

 If the DB cache is made large enough to consume most of memory,
 the ZFS copy will quickly be evicted to stage other I/Os on
 their way to the DB cache.

 What problem does that pose ?

Personally, I'm still not completely sold on the performance  
(performance as in ability, not speed) of ARC eviction. Often times,  
especially during a resilver, a server with ~2GB of RAM free under  
normal circumstances will dive down to the minfree floor, causing  
processes to be swapped out. We've had to take to manually  
constraining ARC max size so this situation is avoided. This is on  
s10u2/3. I haven't tried anything heavy duty with Nevada simply  
because I don't put Nevada in production situations.

Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm  
surprised that this is being met with skepticism considering that  
Oracle highly recommends direct IO be used,  and, IIRC, Oracle  
performance was the main motivation to adding DIO to UFS back in  
Solaris 2.6. This isn't a problem with ZFS or any specific fs per se,  
it's the buffer caching they all employ. So I'm a big fan of seeing  
6429855 come to fruition.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Jim Mauro

Hey Roch -
 We do not retain 2 copies of the same data.

 If the DB cache is made large enough to consume most of memory,
 the ZFS copy will quickly be evicted to stage other I/Os on
 their way to the DB cache.

 What problem does that pose ?

Can't answer that question empirically, because we can't measure this, but
I imagine there's some overhead to ZFS cache management in evicting and
replacing blocks, and that overhead could be eliminated if ZFS could be
told not to cache the blocks at all.

Now, obviously, whether this overhead would be in the noise level, or
something that actually hurts sustainable performance will depend on
several things, but I can envision scenerios where it's overhead I'd
rather avoid if I could.

Thanks,
/jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Richard Elling
Rayson Ho wrote:
 On 10/3/07, Roch - PAE [EMAIL PROTECTED] wrote:
 We do not retain 2 copies of the same data.

 If the DB cache is made large enough to consume most of memory,
 the ZFS copy will quickly be evicted to stage other I/Os on
 their way to the DB cache.

 What problem does that pose ?
 
 Hi Roch,
 
 1) The memory copy operations are expensive... I think the following
 is a good intro to this problem:
 
 Copying data in memory can be a serious bottleneck in DBMS software
 today. This fact is often a surprise to database students, who assume
 that main-memory operations are free compared to disk I/O. But in
 practice, a welltuned database installation is typically not
 I/O-bound.  (section 3.2)

... just the ones people are complaining about ;-)
Indeed it seems rare that a DB performance escalation does not involve
I/O tuning :-(

 http://mitpress.mit.edu/books/chapters/0262693143chapm2.pdf
 
 (Ch 2: Anatomy of a Database System, Readings in Database Systems, 4th Ed)
 
 
 2) If you look at the TPC-C disclosure reports, you will see vendors
 using thousands of disks for the top 10 systems. With that many disks
 working in parallel, the I/O latencies are not as big as of a problem
 as systems with fewer disks.
 
 
 3) Also interesting is Concurrent I/O, which was introduced in AIX 5.2:
 
 Improving Database Performance With AIX Concurrent I/O
 http://www-03.ibm.com/systems/p/os/aix/whitepapers/db_perf_aix.html

This is a pretty decent paper and some of the issues are the same with
UFS.  To wit, direct I/O is not always a win (qv. Bob Sneed's blog)
It also describes what we call the single writer lock problem, which IBM
solves with Concurrent I/O.  See also:
http://www.solarisinternals.com/wiki/index.php/Direct_I/O

ZFS doesn't have the single writer lock problem.  See also:
http://blogs.sun.com/roch/entry/zfs_to_ufs_performance_comparison

Slightly off-topic, in looking at some field data this morning (looking
for something completely unrelated) I notice that the use of directio
on UFS is declining over time.  I'm not sure what that means... hopefully
not more performance escalations...
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Dale Ghent
On Oct 3, 2007, at 5:21 PM, Richard Elling wrote:

 Slightly off-topic, in looking at some field data this morning  
 (looking
 for something completely unrelated) I notice that the use of directio
 on UFS is declining over time.  I'm not sure what that means...  
 hopefully
 not more performance escalations...

Sounds like someone from ZFS team needs to get with someone from  
Oracle/MySQL/Postgres and get the skinny on how the IO rubber-road  
boundary should look, because it doesn't sound like there's a  
definitive or at least a sure answer here.

Oracle trumpets the use of DIO, and there are benchmarks and first- 
hand accounts out there from DBAs on its virtues - at least when  
running on UFS (and EXT2/3 on Linux, etc)

As it relates to ZFS mechanics specifically, there doesn't appear to  
be any settled opinion.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Jason J. W. Williams
Hi Dale,

We're testing out the enhanced arc_max enforcement (track DNLC
entries) using Build 72 right now. Hopefully, it will fix the memory
creep, which is the only real downside to ZFS for DB work it seems to
me. Frankly, of our DB loads have improved performance with ZFS. I
suspect its because we are write-heavy.

-J

On 10/3/07, Dale Ghent [EMAIL PROTECTED] wrote:
 On Oct 3, 2007, at 10:31 AM, Roch - PAE wrote:

  If the DB cache is made large enough to consume most of memory,
  the ZFS copy will quickly be evicted to stage other I/Os on
  their way to the DB cache.
 
  What problem does that pose ?

 Personally, I'm still not completely sold on the performance
 (performance as in ability, not speed) of ARC eviction. Often times,
 especially during a resilver, a server with ~2GB of RAM free under
 normal circumstances will dive down to the minfree floor, causing
 processes to be swapped out. We've had to take to manually
 constraining ARC max size so this situation is avoided. This is on
 s10u2/3. I haven't tried anything heavy duty with Nevada simply
 because I don't put Nevada in production situations.

 Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm
 surprised that this is being met with skepticism considering that
 Oracle highly recommends direct IO be used,  and, IIRC, Oracle
 performance was the main motivation to adding DIO to UFS back in
 Solaris 2.6. This isn't a problem with ZFS or any specific fs per se,
 it's the buffer caching they all employ. So I'm a big fan of seeing
 6429855 come to fruition.

 /dale
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Kugutsumen
Postgres assumes that the OS takes care of caching:

PLEASE NOTE. PostgreSQL counts a lot on the OS to cache data files  
and hence does not bother with duplicating its file caching effort.  
The shared buffers parameter assumes that OS is going to cache a lot  
of files and hence it is generally very low compared with system RAM.  
Even for a dataset in excess of 20GB, a setting of 128MB may be too  
much, if you have only 1GB RAM and an aggressive-at-caching OS like  
Linux. Tuning PostgreSQL for performance, Shridhar Daithankar, Josh  
Berkus, 2003, http://www.varlena.com/GeneralBits/Tidbits/perf.html

Slightly off-topic, I have noticed at least 25% performance gain on  
my postgresql database after installing Wu Fengguang's adaptive read- 
ahead disk cache patch for the linux kernel. http://lkml.org/lkml/ 
2005/9/15/185

http://www.samag.com/documents/s=10101/sam0616a/0616a.htm

I was wondering if Solaris uses a similar approach.

On 04/10/2007, at 4:44 AM, Dale Ghent wrote:


 On Oct 3, 2007, at 5:21 PM, Richard Elling wrote:


 Slightly off-topic, in looking at some field data this morning
 (looking
 for something completely unrelated) I notice that the use of directio
 on UFS is declining over time.  I'm not sure what that means...
 hopefully
 not more performance escalations...


 Sounds like someone from ZFS team needs to get with someone from
 Oracle/MySQL/Postgres and get the skinny on how the IO rubber-road
 boundary should look, because it doesn't sound like there's a
 definitive or at least a sure answer here.

 Oracle trumpets the use of DIO, and there are benchmarks and first-
 hand accounts out there from DBAs on its virtues - at least when
 running on UFS (and EXT2/3 on Linux, etc)

 As it relates to ZFS mechanics specifically, there doesn't appear to
 be any settled opinion.

 /dale
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread eric kustarz

 Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm
 surprised that this is being met with skepticism considering that
 Oracle highly recommends direct IO be used,  and, IIRC, Oracle
 performance was the main motivation to adding DIO to UFS back in
 Solaris 2.6. This isn't a problem with ZFS or any specific fs per se,
 it's the buffer caching they all employ. So I'm a big fan of seeing
 6429855 come to fruition.

The point is that directI/O typically means two things:
1) concurrent I/O
2) no caching at the file system

Most file systems (ufs, vxfs, etc.) don't do 1) or 2) without turning  
on directI/O.

ZFS *does* 1.  It doesn't do 2 (currently).

That is what we're trying to discuss here.

Where does the win come from with directI/O?  Is it 1), 2), or some  
combination?  If its a combination, what's the percentage of each  
towards the win?

We need to tease 1) and 2) apart to have a full understanding.  I'm  
not against adding 2) to ZFS but want more information.  I suppose  
i'll just prototype it and find out for myself.

eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread eric kustarz

On Oct 3, 2007, at 3:44 PM, Dale Ghent wrote:

 On Oct 3, 2007, at 5:21 PM, Richard Elling wrote:

 Slightly off-topic, in looking at some field data this morning
 (looking
 for something completely unrelated) I notice that the use of directio
 on UFS is declining over time.  I'm not sure what that means...
 hopefully
 not more performance escalations...

 Sounds like someone from ZFS team needs to get with someone from
 Oracle/MySQL/Postgres and get the skinny on how the IO rubber-road
 boundary should look, because it doesn't sound like there's a
 definitive or at least a sure answer here.

I've done that already (Oracle, Postgres, JavaDB, etc.).  Because the  
holy grail of directI/O is an overloaded term, we don't really know  
where the win within directI/O lies.  In any event, it seems the  
only way to get a definitive answer here is to prototype a no caching  
property...

eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Direct I/O ability with zfs?

2007-10-02 Thread David Runyon
We are using MySQL, and love the idea of using zfs for this.  We are used to 
using Direct I/O to bypass file system caching (let the DB do this).  Does this 
exist for zfs?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-02 Thread Rayson Ho
1) Modern DBMSs cache database pages in their own buffer pool because
it is less expensive than to access data from the OS. (IIRC, MySQL's
MyISAM is the only one that relies on the FS cache, but a lot of MySQL
sites use INNODB which has its own buffer pool)

2) Also, direct I/O is faster because it avoid double buffering.

Rayson




On 10/2/07, eric kustarz [EMAIL PROTECTED] wrote:
 Not yet, see:
 6429855 Need way to tell ZFS that caching is a lost cause

 Is there a specific reason why you need to do the caching at the DB
 level instead of the file system?  I'm really curious as i've got
 conflicting data on why people do this.  If i get more data on real
 reasons on why we shouldn't cache at the file system, then this could
 get bumped up in my priority queue.

 eric
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-02 Thread Richard Elling
David Runyon wrote:
 We are using MySQL, and love the idea of using zfs for this.  We are used to 
 using Direct I/O to bypass file system caching (let the DB do this).  Does 
 this exist for zfs?

This is a FAQ.  See:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Database_Recommendations
http://blogs.sun.com/roch/entry/zfs_and_directio
http://blogs.sun.com/bobs/entry/one_i_o_two_i

  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-02 Thread przemolicc
On Tue, Oct 02, 2007 at 01:20:24PM -0600, eric kustarz wrote:
 
 On Oct 2, 2007, at 1:11 PM, David Runyon wrote:
 
  We are using MySQL, and love the idea of using zfs for this.  We  
  are used to using Direct I/O to bypass file system caching (let the  
  DB do this).  Does this exist for zfs?
 
 Not yet, see:
 6429855 Need way to tell ZFS that caching is a lost cause
 
 Is there a specific reason why you need to do the caching at the DB  
 level instead of the file system?  I'm really curious as i've got  
 conflicting data on why people do this.  If i get more data on real  
 reasons on why we shouldn't cache at the file system, then this could  
 get bumped up in my priority queue.

At least two reasons:
http://developers.sun.com/solaris/articles/mysql_perf_tune.html#6
http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle
(the first example proofs that this issue is not only Oracle-related)

Regards
przemol


--
http://przemol.blogspot.com/









--
Fajne i smieszne. Zobacz najlepsze filmiki!

 http://link.interia.pl/f1bbb

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss