I would totally be fine with that, so long as it works in a reasonable manner.

In theory, it would even be possible to do sub-page uncached writes from the 
client, and have the OSS handle the read-modify-write of a single page.  That 
would need some help from the CLIO layer to send the small write directly to 
the OST without going through the page cache (also invalidating any overlapping 
page from the client cache) and LNet handling the misaligned RDMA properly.

We used to allow misaligned RDMA with the old liblustre client because it 
didn't ever have any cache, but not with the Linux client.  It _might_ be 
possible to do without major surgery on the servers, and might even speed up 
sub-page random writes.  This would avoid the need to read a whole page over to 
the client just to overwrite part of it and send it back, and also avoid 
contending on DLM write locks for non-overlapping regions since the sub-page 
writes could be sent lockless from the client and the DLM locking and 
page-aligned IO would be handled on the OSS (that is already in the protocol).

That said, this is definitely more in your area of expertise Patrick (and 
Jinshan, CC'd).

Cheers, Andreas

> On Oct 30, 2018, at 09:10, Patrick Farrell <p...@cray.com> wrote:
> 
> Andreas,
>  
> An interesting thought on this, as the same limitation came up recently in 
> discussions with a Cray customer.  Strictly honoring the direct I/O 
> expectations around data copying is apparently optional.  GPFS is a notable 
> example – It allows non page-aligned/page-size direct I/O, but it apparently 
> (This is second hand from a GPFS knowledgeable person, so take with a grain 
> of salt) uses the buffered path (data copy, page cache, etc) and flushes it, 
> O_SYNC style.  My understanding from conversations is this is the general 
> approach taken by file systems that support unaligned direct I/O – they cheat 
> a little and do buffered I/O in those cases.
>  
> So rather than refusing to perform unaligned direct I/O, we could emulate the 
> approach taken by (some) other file systems.  There’s no clear standard here, 
> but this is an option others have taken that might improve the user 
> experience.  (I believe we persuaded our particular user to switch their code 
> away from direct I/O, since they had no real reason to be using it.)
>  
>       • Patrick
>  
> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
> 김형근 <okok102...@fusiondata.co.kr>
> Date: Sunday, October 28, 2018 at 11:40 PM
> To: Andreas Dilger <adil...@whamcloud.com>
> Cc: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
> Subject: Re: [lustre-discuss] dd oflag=direct error (512 byte Direct I/O)
>  
> The software I use is RedHat Virtualization. When using Posix compatible FS, 
> it seems to perform direct I / O with a block size of 256512 bytes.
>  
> If I can't resolve the issue with my storage configuration, I will contact 
> RedHat.
>  
> Your answer was very helpful.
> Thank you.
>  
>  
>  
>  
>  
> 보내는사람 : Andreas Dilger <adil...@whamcloud.com>
>  
> 받는사람 : 김형근 <okok102...@fusiondata.co.kr>
>  
> 참조 : lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org>
>  
> 보낸 날짜 : 2018-10-25 16:47:58
>  
>  
>  
> 제목 : Re: [lustre-discuss] dd oflag=direct error (512 byte Direct I/O)
>  
>  
>  
> On Oct 25, 2018, at 15:05, 김형근
> wrote: 
> > 
> > Hi. 
> > It's a pleasure to meet you, the lustre specialists. 
> > (I do not speak English well ... Thank you for your understanding!) 
> 
> Your english is better than my Korean. :-) 
> 
> > I used the dd command in lustre mount point. (using the oflag = direct 
> > option) 
> > 
> > ------------------------------------------------------------ 
> > dd if = / dev / zero of = / mnt / testfile oflag = direct bs = 512 count = 
> > 1 
> > ------------------------------------------------------------ 
> > 
> > I need direct I / O with 512 byte block size. 
> > This is a required check function on the software I use. 
> 
> What software is it? Is it possible to change the application to use 
> 4096-byte alignment? 
> 
> > But unfortunately, If the direct option is present, 
> > bs must be a multiple of 4K (4096) (for 8K, 12K, 256K, 1M, 8M, etc.) for 
> > operation. 
> > For example, if you enter a value such as 512 or 4095, it will not work. 
> > The error message is as follows. 
> > 
> > 'error message: dd: error writing [filename]: invalid argument' 
> > 
> > My test system is all up to date. (RHEL, lustre-server, client) 
> > I have used both ldiskfs and zfs for backfile systems. The result is same. 
> > 
> > 
> > My question is simply two. 
> > 
> > 1. Why does DirectIO work only in 4k multiples block size? 
> 
> The client PAGE_SIZE on an x86 system is 4096 bytes. The Lustre client 
> cannot cache data smaller than PAGE_SIZE, so the current implementation 
> is limited to have O_DIRECT read/write being a multiple of PAGE_SIZE. 
> 
> I think the same would happen if you try to use O_DIRECT on a disk with 
> 4096-byte native sector drive 
> (https://en.wikipedia.org/w/index.php?title=Advanced_Format§ion=5#4K_native 
> )? 
> 
> > 2. Can I change the settings of the server and client to enable 512bytes of 
> > DirectIO? 
> 
> This would not be possible without changing the Lustre client code. 
> I don't know how easily this is possible to do and still ensure that 
> the 512-byte writes are handled correctly. 
> 
> So far we have not had other requests to change this limitation, so 
> it is not a high priority to change on our side, especially since 
> applications will have to deal with 4096-byte sectors in any case. 
> 
> Cheers, Andreas 
> --- 
> Andreas Dilger 
> Principal Lustre Architect 
> Whamcloud 

Cheers, Andreas
---
Andreas Dilger
CTO Whamcloud




_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to