Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit : > On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote: >> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : >>> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote: [...] With transparent_hugepage=never I can't see the bug anymore. >>> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 >>> (does not apply to 3.10) and without transparent_hugepage=never >>> >>> Jérôme >> Fails with 4.5 + this patch and with 4.5 + this patch + yours >> > There must be some bug in your code, we have upstream user that works > fine with the above combination (see drivers/vfio/vfio_iommu_type1.c) > i suspect you might be releasing the page pin too early (put_page()). In my previous tests, I checked the page before calling put_page and it has already changed. And I also checked that there is not multiple transfers in a single page at once. So I doubt it's that. > > If you really believe it is bug upstream we would need a dumb kernel > module that does gup like you do and that shows the issue. Right now > looking at code (assuming above patches applied) i can't see anything > that can go wrong with THP. The issue is that I doubt I'll be able to do that. We have had code running in production for at least a year without the issue showing up and now a single test shows this. And some tweak to the test (meaning memory footprint in the user space) can make the problem disappear. Is there a way to track what is happening to the THP? From the looks of it, the refcount are changed behind my back? Would kgdb with watch point work on this? Is there a less painful way? Thanks Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit : > On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote: >> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : >>> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote: [...] With transparent_hugepage=never I can't see the bug anymore. >>> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 >>> (does not apply to 3.10) and without transparent_hugepage=never >>> >>> Jérôme >> Fails with 4.5 + this patch and with 4.5 + this patch + yours >> > There must be some bug in your code, we have upstream user that works > fine with the above combination (see drivers/vfio/vfio_iommu_type1.c) > i suspect you might be releasing the page pin too early (put_page()). In my previous tests, I checked the page before calling put_page and it has already changed. And I also checked that there is not multiple transfers in a single page at once. So I doubt it's that. > > If you really believe it is bug upstream we would need a dumb kernel > module that does gup like you do and that shows the issue. Right now > looking at code (assuming above patches applied) i can't see anything > that can go wrong with THP. The issue is that I doubt I'll be able to do that. We have had code running in production for at least a year without the issue showing up and now a single test shows this. And some tweak to the test (meaning memory footprint in the user space) can make the problem disappear. Is there a way to track what is happening to the THP? From the looks of it, the refcount are changed behind my back? Would kgdb with watch point work on this? Is there a less painful way? Thanks Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Hello Nicolas, On Thu, May 12, 2016 at 05:31:52PM +0200, Nicolas Morey-Chaisemartin wrote: > > > Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit : > > On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote: > >> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : > >>> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin > >>> wrote: > [...] > With transparent_hugepage=never I can't see the bug anymore. > > >>> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 > >>> (does not apply to 3.10) and without transparent_hugepage=never > >>> > >>> Jérôme > >> Fails with 4.5 + this patch and with 4.5 + this patch + yours > >> > > There must be some bug in your code, we have upstream user that works > > fine with the above combination (see drivers/vfio/vfio_iommu_type1.c) > > i suspect you might be releasing the page pin too early (put_page()). > In my previous tests, I checked the page before calling put_page and it has > already changed. > And I also checked that there is not multiple transfers in a single page at > once. > So I doubt it's that. > > > > If you really believe it is bug upstream we would need a dumb kernel > > module that does gup like you do and that shows the issue. Right now > > looking at code (assuming above patches applied) i can't see anything > > that can go wrong with THP. > > The issue is that I doubt I'll be able to do that. We have had code running > in production for at least a year without the issue showing up and now a > single test shows this. > And some tweak to the test (meaning memory footprint in the user space) can > make the problem disappear. > > Is there a way to track what is happening to the THP? From the looks of it, > the refcount are changed behind my back? Would kgdb with watch point work on > this? > Is there a less painful way? Do you use fork()? If you have threads and your DMA I/O granularity is smaller than PAGE_SIZE, and a thread of the application in parent or child is writing to another part of the page, the I/O can get lost (worse, it doesn't get really lost but it goes to the child by mistake, instead of sticking to the "mm" where you executed get_user_pages). This is practically a bug in fork() but it's known. It can affect any app that uses get_user_pages/O_DIRECT, fork() and uses thread and the I/O granularity is smaller than PAGE_SIZE. The same bug cannot happen with KSM or other things that can wrprotect a page out of app control, because all things out of app control checks there are no page pins before wrprotecting the page. So it's up to the app to control "fork()". To fix it, you should do one of: 1) use MADV_DONTFORK on the pinned region, 2) prevent fork to run while you've pins taken with get_user_pages or anyway while get_user_pages may be running concurrently, 3) use a PAGE_SIZE I/O granularity and/or prevent the threads to write to the other part of the page while DMA is running. I'm not aware of other issues that could screw with page pins with THP on kernels <=4.4, if there were, everything should fall apart including O_DIRECT and qemu cache=none. The only issue I'm aware of that can cause DMA to get lost with page pins is the aforementioned one. To debug it further, I would suggest to start by searching for "fork" calls, and adding MADV_DONTFORK to the pinned region if there's any fork() in your testcase. Without being allowed to see the source there's not much else we can do considering there's no sign of unknown bugs in this area in kernels <=4.4. All there is, is the known bug above, but apps that could be affected by it, actively avoid it by using MADV_DONTFORK like with qemu cache=none. Thanks, Andrea
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Hello Nicolas, On Thu, May 12, 2016 at 05:31:52PM +0200, Nicolas Morey-Chaisemartin wrote: > > > Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit : > > On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote: > >> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : > >>> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin > >>> wrote: > [...] > With transparent_hugepage=never I can't see the bug anymore. > > >>> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 > >>> (does not apply to 3.10) and without transparent_hugepage=never > >>> > >>> Jérôme > >> Fails with 4.5 + this patch and with 4.5 + this patch + yours > >> > > There must be some bug in your code, we have upstream user that works > > fine with the above combination (see drivers/vfio/vfio_iommu_type1.c) > > i suspect you might be releasing the page pin too early (put_page()). > In my previous tests, I checked the page before calling put_page and it has > already changed. > And I also checked that there is not multiple transfers in a single page at > once. > So I doubt it's that. > > > > If you really believe it is bug upstream we would need a dumb kernel > > module that does gup like you do and that shows the issue. Right now > > looking at code (assuming above patches applied) i can't see anything > > that can go wrong with THP. > > The issue is that I doubt I'll be able to do that. We have had code running > in production for at least a year without the issue showing up and now a > single test shows this. > And some tweak to the test (meaning memory footprint in the user space) can > make the problem disappear. > > Is there a way to track what is happening to the THP? From the looks of it, > the refcount are changed behind my back? Would kgdb with watch point work on > this? > Is there a less painful way? Do you use fork()? If you have threads and your DMA I/O granularity is smaller than PAGE_SIZE, and a thread of the application in parent or child is writing to another part of the page, the I/O can get lost (worse, it doesn't get really lost but it goes to the child by mistake, instead of sticking to the "mm" where you executed get_user_pages). This is practically a bug in fork() but it's known. It can affect any app that uses get_user_pages/O_DIRECT, fork() and uses thread and the I/O granularity is smaller than PAGE_SIZE. The same bug cannot happen with KSM or other things that can wrprotect a page out of app control, because all things out of app control checks there are no page pins before wrprotecting the page. So it's up to the app to control "fork()". To fix it, you should do one of: 1) use MADV_DONTFORK on the pinned region, 2) prevent fork to run while you've pins taken with get_user_pages or anyway while get_user_pages may be running concurrently, 3) use a PAGE_SIZE I/O granularity and/or prevent the threads to write to the other part of the page while DMA is running. I'm not aware of other issues that could screw with page pins with THP on kernels <=4.4, if there were, everything should fall apart including O_DIRECT and qemu cache=none. The only issue I'm aware of that can cause DMA to get lost with page pins is the aforementioned one. To debug it further, I would suggest to start by searching for "fork" calls, and adding MADV_DONTFORK to the pinned region if there's any fork() in your testcase. Without being allowed to see the source there's not much else we can do considering there's no sign of unknown bugs in this area in kernels <=4.4. All there is, is the known bug above, but apps that could be affected by it, actively avoid it by using MADV_DONTFORK like with qemu cache=none. Thanks, Andrea
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote: > Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : > > On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote: > >> > >> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit : > >>> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin > >>> wrote: > Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin > > wrote: > >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > [...] > >> Hi, > >> > >> I backported the patch to 3.10 (had to copy paste pmd_protnone > >> defitinition from 4.5) and it's working ! > >> I'll open a ticket in Redhat tracker to try and get this fixed in > >> RHEL7. > >> > >> I have a dumb question though: how can we end up in numa/misplaced > >> memory code on a single socket system? > >> > > This patch is not a fix, do you see bug message in kernel log ? Because > > if > > you do that it means we have a bigger issue. > > > > You did not answer one of my previous question, do you set > > get_user_pages > > with write = 1 as a paremeter ? > > > > Also it would be a lot easier if you were testing with lastest 4.6 or > > 4.5 > > not RHEL kernel as they are far appart and what might looks like same > > issue > > on both might be totaly different bugs. > > > > If you only really care about RHEL kernel then open a bug with Red Hat > > and > > you can add me in bug-cc> > > > Cheers, > > Jérôme > I finally managed to get a proper setup. > I build a vanilla 4.5 kernel from git tree using the Centos7 config, my > test fails as usual. > I applied your patch, rebuild => still fails and no new messages in > dmesg. > > Now that I don't have to go through the RPM repackaging, I can try out > things much quicker if you have any ideas. > > >>> Still an issue if you boot with transparent_hugepage=never ? > >>> > >>> Also to simplify investigation force write to 1 all the time no matter > >>> what. > >>> > >>> Cheers, > >>> Jérôme > >> With transparent_hugepage=never I can't see the bug anymore. > >> > > Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 > > (does not apply to 3.10) and without transparent_hugepage=never > > > > Jérôme > > Fails with 4.5 + this patch and with 4.5 + this patch + yours > There must be some bug in your code, we have upstream user that works fine with the above combination (see drivers/vfio/vfio_iommu_type1.c) i suspect you might be releasing the page pin too early (put_page()). If you really believe it is bug upstream we would need a dumb kernel module that does gup like you do and that shows the issue. Right now looking at code (assuming above patches applied) i can't see anything that can go wrong with THP. Cheers, Jérôme
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote: > Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : > > On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote: > >> > >> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit : > >>> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin > >>> wrote: > Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin > > wrote: > >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > [...] > >> Hi, > >> > >> I backported the patch to 3.10 (had to copy paste pmd_protnone > >> defitinition from 4.5) and it's working ! > >> I'll open a ticket in Redhat tracker to try and get this fixed in > >> RHEL7. > >> > >> I have a dumb question though: how can we end up in numa/misplaced > >> memory code on a single socket system? > >> > > This patch is not a fix, do you see bug message in kernel log ? Because > > if > > you do that it means we have a bigger issue. > > > > You did not answer one of my previous question, do you set > > get_user_pages > > with write = 1 as a paremeter ? > > > > Also it would be a lot easier if you were testing with lastest 4.6 or > > 4.5 > > not RHEL kernel as they are far appart and what might looks like same > > issue > > on both might be totaly different bugs. > > > > If you only really care about RHEL kernel then open a bug with Red Hat > > and > > you can add me in bug-cc > > > > Cheers, > > Jérôme > I finally managed to get a proper setup. > I build a vanilla 4.5 kernel from git tree using the Centos7 config, my > test fails as usual. > I applied your patch, rebuild => still fails and no new messages in > dmesg. > > Now that I don't have to go through the RPM repackaging, I can try out > things much quicker if you have any ideas. > > >>> Still an issue if you boot with transparent_hugepage=never ? > >>> > >>> Also to simplify investigation force write to 1 all the time no matter > >>> what. > >>> > >>> Cheers, > >>> Jérôme > >> With transparent_hugepage=never I can't see the bug anymore. > >> > > Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 > > (does not apply to 3.10) and without transparent_hugepage=never > > > > Jérôme > > Fails with 4.5 + this patch and with 4.5 + this patch + yours > There must be some bug in your code, we have upstream user that works fine with the above combination (see drivers/vfio/vfio_iommu_type1.c) i suspect you might be releasing the page pin too early (put_page()). If you really believe it is bug upstream we would need a dumb kernel module that does gup like you do and that shows the issue. Right now looking at code (assuming above patches applied) i can't see anything that can go wrong with THP. Cheers, Jérôme
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : > On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote: >> >> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit : >>> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote: Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin > wrote: >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: [...] >> Hi, >> >> I backported the patch to 3.10 (had to copy paste pmd_protnone >> defitinition from 4.5) and it's working ! >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. >> >> I have a dumb question though: how can we end up in numa/misplaced >> memory code on a single socket system? >> > This patch is not a fix, do you see bug message in kernel log ? Because if > you do that it means we have a bigger issue. > > You did not answer one of my previous question, do you set get_user_pages > with write = 1 as a paremeter ? > > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > not RHEL kernel as they are far appart and what might looks like same > issue > on both might be totaly different bugs. > > If you only really care about RHEL kernel then open a bug with Red Hat and > you can add me in bug-cc> > Cheers, > Jérôme I finally managed to get a proper setup. I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test fails as usual. I applied your patch, rebuild => still fails and no new messages in dmesg. Now that I don't have to go through the RPM repackaging, I can try out things much quicker if you have any ideas. >>> Still an issue if you boot with transparent_hugepage=never ? >>> >>> Also to simplify investigation force write to 1 all the time no matter what. >>> >>> Cheers, >>> Jérôme >> With transparent_hugepage=never I can't see the bug anymore. >> > Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 > (does not apply to 3.10) and without transparent_hugepage=never > > Jérôme Fails with 4.5 + this patch and with 4.5 + this patch + yours Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : > On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote: >> >> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit : >>> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote: Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin > wrote: >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: [...] >> Hi, >> >> I backported the patch to 3.10 (had to copy paste pmd_protnone >> defitinition from 4.5) and it's working ! >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. >> >> I have a dumb question though: how can we end up in numa/misplaced >> memory code on a single socket system? >> > This patch is not a fix, do you see bug message in kernel log ? Because if > you do that it means we have a bigger issue. > > You did not answer one of my previous question, do you set get_user_pages > with write = 1 as a paremeter ? > > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > not RHEL kernel as they are far appart and what might looks like same > issue > on both might be totaly different bugs. > > If you only really care about RHEL kernel then open a bug with Red Hat and > you can add me in bug-cc > > Cheers, > Jérôme I finally managed to get a proper setup. I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test fails as usual. I applied your patch, rebuild => still fails and no new messages in dmesg. Now that I don't have to go through the RPM repackaging, I can try out things much quicker if you have any ideas. >>> Still an issue if you boot with transparent_hugepage=never ? >>> >>> Also to simplify investigation force write to 1 all the time no matter what. >>> >>> Cheers, >>> Jérôme >> With transparent_hugepage=never I can't see the bug anymore. >> > Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 > (does not apply to 3.10) and without transparent_hugepage=never > > Jérôme Fails with 4.5 + this patch and with 4.5 + this patch + yours Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote: > > > Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit : > > On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote: > >> > >> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > >>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin > >>> wrote: > Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > >> [...] > Hi, > > I backported the patch to 3.10 (had to copy paste pmd_protnone > defitinition from 4.5) and it's working ! > I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. > > I have a dumb question though: how can we end up in numa/misplaced > memory code on a single socket system? > > >>> This patch is not a fix, do you see bug message in kernel log ? Because if > >>> you do that it means we have a bigger issue. > >>> > >>> You did not answer one of my previous question, do you set get_user_pages > >>> with write = 1 as a paremeter ? > >>> > >>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > >>> not RHEL kernel as they are far appart and what might looks like same > >>> issue > >>> on both might be totaly different bugs. > >>> > >>> If you only really care about RHEL kernel then open a bug with Red Hat and > >>> you can add me in bug-cc> >>> > >>> Cheers, > >>> Jérôme > >> I finally managed to get a proper setup. > >> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my > >> test fails as usual. > >> I applied your patch, rebuild => still fails and no new messages in dmesg. > >> > >> Now that I don't have to go through the RPM repackaging, I can try out > >> things much quicker if you have any ideas. > >> > > Still an issue if you boot with transparent_hugepage=never ? > > > > Also to simplify investigation force write to 1 all the time no matter what. > > > > Cheers, > > Jérôme > > With transparent_hugepage=never I can't see the bug anymore. > Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 (does not apply to 3.10) and without transparent_hugepage=never Jérôme
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote: > > > Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit : > > On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote: > >> > >> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > >>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin > >>> wrote: > Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > >> [...] > Hi, > > I backported the patch to 3.10 (had to copy paste pmd_protnone > defitinition from 4.5) and it's working ! > I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. > > I have a dumb question though: how can we end up in numa/misplaced > memory code on a single socket system? > > >>> This patch is not a fix, do you see bug message in kernel log ? Because if > >>> you do that it means we have a bigger issue. > >>> > >>> You did not answer one of my previous question, do you set get_user_pages > >>> with write = 1 as a paremeter ? > >>> > >>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > >>> not RHEL kernel as they are far appart and what might looks like same > >>> issue > >>> on both might be totaly different bugs. > >>> > >>> If you only really care about RHEL kernel then open a bug with Red Hat and > >>> you can add me in bug-cc > >>> > >>> Cheers, > >>> Jérôme > >> I finally managed to get a proper setup. > >> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my > >> test fails as usual. > >> I applied your patch, rebuild => still fails and no new messages in dmesg. > >> > >> Now that I don't have to go through the RPM repackaging, I can try out > >> things much quicker if you have any ideas. > >> > > Still an issue if you boot with transparent_hugepage=never ? > > > > Also to simplify investigation force write to 1 all the time no matter what. > > > > Cheers, > > Jérôme > > With transparent_hugepage=never I can't see the bug anymore. > Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5 (does not apply to 3.10) and without transparent_hugepage=never Jérôme
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit : > On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote: >> >> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : >>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: >> [...] Hi, I backported the patch to 3.10 (had to copy paste pmd_protnone defitinition from 4.5) and it's working ! I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. I have a dumb question though: how can we end up in numa/misplaced memory code on a single socket system? >>> This patch is not a fix, do you see bug message in kernel log ? Because if >>> you do that it means we have a bigger issue. >>> >>> You did not answer one of my previous question, do you set get_user_pages >>> with write = 1 as a paremeter ? >>> >>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 >>> not RHEL kernel as they are far appart and what might looks like same issue >>> on both might be totaly different bugs. >>> >>> If you only really care about RHEL kernel then open a bug with Red Hat and >>> you can add me in bug-cc>>> >>> Cheers, >>> Jérôme >> I finally managed to get a proper setup. >> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test >> fails as usual. >> I applied your patch, rebuild => still fails and no new messages in dmesg. >> >> Now that I don't have to go through the RPM repackaging, I can try out >> things much quicker if you have any ideas. >> > Still an issue if you boot with transparent_hugepage=never ? > > Also to simplify investigation force write to 1 all the time no matter what. > > Cheers, > Jérôme With transparent_hugepage=never I can't see the bug anymore. Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit : > On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote: >> >> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : >>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: >> [...] Hi, I backported the patch to 3.10 (had to copy paste pmd_protnone defitinition from 4.5) and it's working ! I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. I have a dumb question though: how can we end up in numa/misplaced memory code on a single socket system? >>> This patch is not a fix, do you see bug message in kernel log ? Because if >>> you do that it means we have a bigger issue. >>> >>> You did not answer one of my previous question, do you set get_user_pages >>> with write = 1 as a paremeter ? >>> >>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 >>> not RHEL kernel as they are far appart and what might looks like same issue >>> on both might be totaly different bugs. >>> >>> If you only really care about RHEL kernel then open a bug with Red Hat and >>> you can add me in bug-cc >>> >>> Cheers, >>> Jérôme >> I finally managed to get a proper setup. >> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test >> fails as usual. >> I applied your patch, rebuild => still fails and no new messages in dmesg. >> >> Now that I don't have to go through the RPM repackaging, I can try out >> things much quicker if you have any ideas. >> > Still an issue if you boot with transparent_hugepage=never ? > > Also to simplify investigation force write to 1 all the time no matter what. > > Cheers, > Jérôme With transparent_hugepage=never I can't see the bug anymore. Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote: > > > Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: > >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > [...] > >> Hi, > >> > >> I backported the patch to 3.10 (had to copy paste pmd_protnone > >> defitinition from 4.5) and it's working ! > >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. > >> > >> I have a dumb question though: how can we end up in numa/misplaced memory > >> code on a single socket system? > >> > > This patch is not a fix, do you see bug message in kernel log ? Because if > > you do that it means we have a bigger issue. > > > > You did not answer one of my previous question, do you set get_user_pages > > with write = 1 as a paremeter ? > > > > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > > not RHEL kernel as they are far appart and what might looks like same issue > > on both might be totaly different bugs. > > > > If you only really care about RHEL kernel then open a bug with Red Hat and > > you can add me in bug-cc> > > > Cheers, > > Jérôme > > I finally managed to get a proper setup. > I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test > fails as usual. > I applied your patch, rebuild => still fails and no new messages in dmesg. > > Now that I don't have to go through the RPM repackaging, I can try out things > much quicker if you have any ideas. > Still an issue if you boot with transparent_hugepage=never ? Also to simplify investigation force write to 1 all the time no matter what. Cheers, Jérôme
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote: > > > Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: > >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > [...] > >> Hi, > >> > >> I backported the patch to 3.10 (had to copy paste pmd_protnone > >> defitinition from 4.5) and it's working ! > >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. > >> > >> I have a dumb question though: how can we end up in numa/misplaced memory > >> code on a single socket system? > >> > > This patch is not a fix, do you see bug message in kernel log ? Because if > > you do that it means we have a bigger issue. > > > > You did not answer one of my previous question, do you set get_user_pages > > with write = 1 as a paremeter ? > > > > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > > not RHEL kernel as they are far appart and what might looks like same issue > > on both might be totaly different bugs. > > > > If you only really care about RHEL kernel then open a bug with Red Hat and > > you can add me in bug-cc > > > > Cheers, > > Jérôme > > I finally managed to get a proper setup. > I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test > fails as usual. > I applied your patch, rebuild => still fails and no new messages in dmesg. > > Now that I don't have to go through the RPM repackaging, I can try out things > much quicker if you have any ideas. > Still an issue if you boot with transparent_hugepage=never ? Also to simplify investigation force write to 1 all the time no matter what. Cheers, Jérôme
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: [...] >> Hi, >> >> I backported the patch to 3.10 (had to copy paste pmd_protnone defitinition >> from 4.5) and it's working ! >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. >> >> I have a dumb question though: how can we end up in numa/misplaced memory >> code on a single socket system? >> > This patch is not a fix, do you see bug message in kernel log ? Because if > you do that it means we have a bigger issue. > > You did not answer one of my previous question, do you set get_user_pages > with write = 1 as a paremeter ? > > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > not RHEL kernel as they are far appart and what might looks like same issue > on both might be totaly different bugs. > > If you only really care about RHEL kernel then open a bug with Red Hat and > you can add me in bug-cc> > Cheers, > Jérôme I finally managed to get a proper setup. I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test fails as usual. I applied your patch, rebuild => still fails and no new messages in dmesg. Now that I don't have to go through the RPM repackaging, I can try out things much quicker if you have any ideas. Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: [...] >> Hi, >> >> I backported the patch to 3.10 (had to copy paste pmd_protnone defitinition >> from 4.5) and it's working ! >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. >> >> I have a dumb question though: how can we end up in numa/misplaced memory >> code on a single socket system? >> > This patch is not a fix, do you see bug message in kernel log ? Because if > you do that it means we have a bigger issue. > > You did not answer one of my previous question, do you set get_user_pages > with write = 1 as a paremeter ? > > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > not RHEL kernel as they are far appart and what might looks like same issue > on both might be totaly different bugs. > > If you only really care about RHEL kernel then open a bug with Red Hat and > you can add me in bug-cc > > Cheers, > Jérôme I finally managed to get a proper setup. I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test fails as usual. I applied your patch, rebuild => still fails and no new messages in dmesg. Now that I don't have to go through the RPM repackaging, I can try out things much quicker if you have any ideas. Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/10/2016 à 03:34 PM, Jerome Glisse a écrit : > On Tue, May 10, 2016 at 01:15:02PM +0200, Nicolas Morey Chaisemartin wrote: >> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : >>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > [...] > Hi, I backported the patch to 3.10 (had to copy paste pmd_protnone defitinition from 4.5) and it's working ! I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. I have a dumb question though: how can we end up in numa/misplaced memory code on a single socket system? >>> This patch is not a fix, do you see bug message in kernel log ? Because if >>> you do that it means we have a bigger issue. >> I don't see any on my 3.10. I have DMA_API_DEBUG enabled but I don't think >> it has an impact. > My patch can't be backported to 3.10 as is, you most likely need to replace > pmd_protnone() by pmd_numa() > >>> You did not answer one of my previous question, do you set get_user_pages >>> with write = 1 as a paremeter ? >> For the read from the device, yes: >> down_read(>mm->mmap_sem); >> res = get_user_pages( >> current, >> current->mm, >> (unsigned long) iov->host_addr, >> page_count, >> (write_mode == 0) ? 1 : 0, /* write */ >> 0, /* force */ >> >pages[sg_o], >> NULL); >> up_read(>mm->mmap_sem); > As i don't have context to infer how write_mode is set above, do you mind > retesting your driver and always asking for write no matter what ? write_mode is 0 for car2host transfers so yes, write_mode is 1. During debug I tried with write_mode=1 and force=1 in all cases and it failed too. >>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 >>> not RHEL kernel as they are far appart and what might looks like same issue >>> on both might be totaly different bugs. >> Is a RPM from elrepo ok? >> http://elrepo.org/linux/kernel/el7/SRPMS/ > Yes should be ok for testing. > I tried the elrpo 4.5.2 package without your patch and my test fails, sadly the src rpm from elrepo does not contaisn the kernel sources and I haven't looked how to get the proper tarball. I tried to rebuild a src rpm for a fedora 24 (kernel 4.5.3) and it works without your patch. I'm not sure what differs in their config. I'll keep digging. Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/10/2016 à 03:34 PM, Jerome Glisse a écrit : > On Tue, May 10, 2016 at 01:15:02PM +0200, Nicolas Morey Chaisemartin wrote: >> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : >>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > [...] > Hi, I backported the patch to 3.10 (had to copy paste pmd_protnone defitinition from 4.5) and it's working ! I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. I have a dumb question though: how can we end up in numa/misplaced memory code on a single socket system? >>> This patch is not a fix, do you see bug message in kernel log ? Because if >>> you do that it means we have a bigger issue. >> I don't see any on my 3.10. I have DMA_API_DEBUG enabled but I don't think >> it has an impact. > My patch can't be backported to 3.10 as is, you most likely need to replace > pmd_protnone() by pmd_numa() > >>> You did not answer one of my previous question, do you set get_user_pages >>> with write = 1 as a paremeter ? >> For the read from the device, yes: >> down_read(>mm->mmap_sem); >> res = get_user_pages( >> current, >> current->mm, >> (unsigned long) iov->host_addr, >> page_count, >> (write_mode == 0) ? 1 : 0, /* write */ >> 0, /* force */ >> >pages[sg_o], >> NULL); >> up_read(>mm->mmap_sem); > As i don't have context to infer how write_mode is set above, do you mind > retesting your driver and always asking for write no matter what ? write_mode is 0 for car2host transfers so yes, write_mode is 1. During debug I tried with write_mode=1 and force=1 in all cases and it failed too. >>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 >>> not RHEL kernel as they are far appart and what might looks like same issue >>> on both might be totaly different bugs. >> Is a RPM from elrepo ok? >> http://elrepo.org/linux/kernel/el7/SRPMS/ > Yes should be ok for testing. > I tried the elrpo 4.5.2 package without your patch and my test fails, sadly the src rpm from elrepo does not contaisn the kernel sources and I haven't looked how to get the proper tarball. I tried to rebuild a src rpm for a fedora 24 (kernel 4.5.3) and it works without your patch. I'm not sure what differs in their config. I'll keep digging. Nicolas
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Tue, May 10, 2016 at 01:15:02PM +0200, Nicolas Morey Chaisemartin wrote: > Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: > >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: [...] > >> Hi, > >> > >> I backported the patch to 3.10 (had to copy paste pmd_protnone > >> defitinition from 4.5) and it's working ! > >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. > >> > >> I have a dumb question though: how can we end up in numa/misplaced memory > >> code on a single socket system? > >> > > This patch is not a fix, do you see bug message in kernel log ? Because if > > you do that it means we have a bigger issue. > I don't see any on my 3.10. I have DMA_API_DEBUG enabled but I don't think it > has an impact. My patch can't be backported to 3.10 as is, you most likely need to replace pmd_protnone() by pmd_numa() > > You did not answer one of my previous question, do you set get_user_pages > > with write = 1 as a paremeter ? > For the read from the device, yes: > down_read(>mm->mmap_sem); > res = get_user_pages( > current, > current->mm, > (unsigned long) iov->host_addr, > page_count, > (write_mode == 0) ? 1 : 0, /* write */ > 0, /* force */ > >pages[sg_o], > NULL); > up_read(>mm->mmap_sem); As i don't have context to infer how write_mode is set above, do you mind retesting your driver and always asking for write no matter what ? > > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > > not RHEL kernel as they are far appart and what might looks like same issue > > on both might be totaly different bugs. > Is a RPM from elrepo ok? > http://elrepo.org/linux/kernel/el7/SRPMS/ Yes should be ok for testing. Cheers, Jérôme
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Tue, May 10, 2016 at 01:15:02PM +0200, Nicolas Morey Chaisemartin wrote: > Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: > >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: [...] > >> Hi, > >> > >> I backported the patch to 3.10 (had to copy paste pmd_protnone > >> defitinition from 4.5) and it's working ! > >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7. > >> > >> I have a dumb question though: how can we end up in numa/misplaced memory > >> code on a single socket system? > >> > > This patch is not a fix, do you see bug message in kernel log ? Because if > > you do that it means we have a bigger issue. > I don't see any on my 3.10. I have DMA_API_DEBUG enabled but I don't think it > has an impact. My patch can't be backported to 3.10 as is, you most likely need to replace pmd_protnone() by pmd_numa() > > You did not answer one of my previous question, do you set get_user_pages > > with write = 1 as a paremeter ? > For the read from the device, yes: > down_read(>mm->mmap_sem); > res = get_user_pages( > current, > current->mm, > (unsigned long) iov->host_addr, > page_count, > (write_mode == 0) ? 1 : 0, /* write */ > 0, /* force */ > >pages[sg_o], > NULL); > up_read(>mm->mmap_sem); As i don't have context to infer how write_mode is set above, do you mind retesting your driver and always asking for write no matter what ? > > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5 > > not RHEL kernel as they are far appart and what might looks like same issue > > on both might be totaly different bugs. > Is a RPM from elrepo ok? > http://elrepo.org/linux/kernel/el7/SRPMS/ Yes should be ok for testing. Cheers, Jérôme
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > Hi everyone, > > This is a repost from a different address as it seems the previous one > ended in Gmail junk due to a domain error.. linux-kernel is a very high volume list which few are reading: that also will account for your lack of response so far (apart from the indefatigable Alan). I've added linux-mm, and some people from another thread regarding THP and get_user_pages() pins which has been discussed in recent days. Make no mistake, the issue you're raising here is definitely not the same as that one (which is specifically about the new THP refcounting in v4.5+, whereas you're reporting a problem you've seen in both a v3.10-based kernel and in v4.5). But I think their heads are in gear, much more so than mine, and likely to spot something. > I added more info found while blindly debugging the issue. > > Short version: > I'm having an issue with direct DMA transfer from a device to host memory. > It seems some of the data is not transferring to the appropriate page. > > Some more details: > I'm debugging a home made PCI driver for our board (Kalray), attached to > a x86_64 host running centos7 (3.10.0-327.el7.x86_64) > > In the current case, a userland application transfers back and forth data > through read/write operations on a file. > On the kernel side, it triggers DMA transfers through the PCI to/from our > board memory. > > We followed what pretty much all docs said about direct I/O to user > buffers: > > 1) get_user_pages() (in the current case, it's at most 16 pages at once) > 2) convert to a scatterlist > 3) pci_map_sg > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually > possible) > 4) A lot of DMA engine handling code, using the dmaengine layer and > virt-dma > 5) wait for transfer complete, in the mean time, go back to (1) to > schedule more work, if any > 6) pci_unmap_sg > 7) for read (card2host) transfer, set_page_dirty_lock > 8) page_cache_release > > In 99,% it works perfectly. > However, I have one userland application where a few pages are not > written by a read (card2host) transfer. > The buffer is memset them to a different value so I can check that > nothing has overwritten them. > > I know (PCI protocol analyser) that the data left our board for the > "right" address (the one set in the sg by pci_map_sg). > I tried reading the data between the pci_unmap_sg and the set_page_dirty, > using > uint32_t *addr = page_address(trans->pages[0]); > dev_warn(>pdev->dev, "val = %x\n", *addr); > and it has the expected value. > But if I try to copy_from_user (using the address coming from userland, > the one passed to get_user_pages), the data has not been written and I > see the memset value. > > New infos: > > The issue happens with IOMMU on or off. > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or > errors. > > I digged a little bit deeper with my very small understanding of linux mm > and I discovered that: > * we are using transparent huge pages > * the page 'not transferred' are the last few of a huge page > More precisely: > - We have several transfer in flight from the same user buffer > - Each transfer is 16 pages long > - At one point in time, we start transferring from another huge page > (transfers are still in flight from the previous one) > - When a transfer from the previous huge page completes, I dumped at the > mapcount of the pages from the previous transfers, > they are all to 0. The pages are still mapped to dma at this point. > - A get_user_page to the address of the completed transfer returns return > a different struct page * then the on I had. > But this is before I have unmapped/put_page them back. From my > understanding this should not have happened. > > I tried the same code with a kernel 4.5 and encountered the same issue > > Disabling transparent huge pages makes the issue disapear > > Thanks in advance It does look to me as if pages are being migrated, despite being pinned by get_user_pages(): and that would be wrong. Originally I intended to suggest that THP is probably merely the cause of compaction, with compaction causing the page migration. But you posted very interesting details in an earlier mail on 27th April from: > I ran
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit : > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > Hi everyone, > > This is a repost from a different address as it seems the previous one > ended in Gmail junk due to a domain error.. linux-kernel is a very high volume list which few are reading: that also will account for your lack of response so far (apart from the indefatigable Alan). I've added linux-mm, and some people from another thread regarding THP and get_user_pages() pins which has been discussed in recent days. Make no mistake, the issue you're raising here is definitely not the same as that one (which is specifically about the new THP refcounting in v4.5+, whereas you're reporting a problem you've seen in both a v3.10-based kernel and in v4.5). But I think their heads are in gear, much more so than mine, and likely to spot something. > I added more info found while blindly debugging the issue. > > Short version: > I'm having an issue with direct DMA transfer from a device to host memory. > It seems some of the data is not transferring to the appropriate page. > > Some more details: > I'm debugging a home made PCI driver for our board (Kalray), attached to > a x86_64 host running centos7 (3.10.0-327.el7.x86_64) > > In the current case, a userland application transfers back and forth data > through read/write operations on a file. > On the kernel side, it triggers DMA transfers through the PCI to/from our > board memory. > > We followed what pretty much all docs said about direct I/O to user > buffers: > > 1) get_user_pages() (in the current case, it's at most 16 pages at once) > 2) convert to a scatterlist > 3) pci_map_sg > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually > possible) > 4) A lot of DMA engine handling code, using the dmaengine layer and > virt-dma > 5) wait for transfer complete, in the mean time, go back to (1) to > schedule more work, if any > 6) pci_unmap_sg > 7) for read (card2host) transfer, set_page_dirty_lock > 8) page_cache_release > > In 99,% it works perfectly. > However, I have one userland application where a few pages are not > written by a read (card2host) transfer. > The buffer is memset them to a different value so I can check that > nothing has overwritten them. > > I know (PCI protocol analyser) that the data left our board for the > "right" address (the one set in the sg by pci_map_sg). > I tried reading the data between the pci_unmap_sg and the set_page_dirty, > using > uint32_t *addr = page_address(trans->pages[0]); > dev_warn(>pdev->dev, "val = %x\n", *addr); > and it has the expected value. > But if I try to copy_from_user (using the address coming from userland, > the one passed to get_user_pages), the data has not been written and I > see the memset value. > > New infos: > > The issue happens with IOMMU on or off. > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or > errors. > > I digged a little bit deeper with my very small understanding of linux mm > and I discovered that: > * we are using transparent huge pages > * the page 'not transferred' are the last few of a huge page > More precisely: > - We have several transfer in flight from the same user buffer > - Each transfer is 16 pages long > - At one point in time, we start transferring from another huge page > (transfers are still in flight from the previous one) > - When a transfer from the previous huge page completes, I dumped at the > mapcount of the pages from the previous transfers, > they are all to 0. The pages are still mapped to dma at this point. > - A get_user_page to the address of the completed transfer returns return > a different struct page * then the on I had. > But this is before I have unmapped/put_page them back. From my > understanding this should not have happened. > > I tried the same code with a kernel 4.5 and encountered the same issue > > Disabling transparent huge pages makes the issue disapear > > Thanks in advance It does look to me as if pages are being migrated, despite being pinned by get_user_pages(): and that would be wrong. Originally I intended to suggest that THP is probably merely the cause of compaction, with compaction causing the page migration. But you posted very interesting details in an earlier mail on 27th April from : > I ran some more tests:
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: > Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > >> > >>> Hi everyone, > >>> > >>> This is a repost from a different address as it seems the previous one > >>> ended in Gmail junk due to a domain error.. > >> linux-kernel is a very high volume list which few are reading: > >> that also will account for your lack of response so far > >> (apart from the indefatigable Alan). > >> > >> I've added linux-mm, and some people from another thread regarding > >> THP and get_user_pages() pins which has been discussed in recent days. > >> > >> Make no mistake, the issue you're raising here is definitely not the > >> same as that one (which is specifically about the new THP refcounting > >> in v4.5+, whereas you're reporting a problem you've seen in both a > >> v3.10-based kernel and in v4.5). But I think their heads are in > >> gear, much more so than mine, and likely to spot something. > >> > >>> I added more info found while blindly debugging the issue. > >>> > >>> Short version: > >>> I'm having an issue with direct DMA transfer from a device to host memory. > >>> It seems some of the data is not transferring to the appropriate page. > >>> > >>> Some more details: > >>> I'm debugging a home made PCI driver for our board (Kalray), attached to > >>> a x86_64 host running centos7 (3.10.0-327.el7.x86_64) > >>> > >>> In the current case, a userland application transfers back and forth data > >>> through read/write operations on a file. > >>> On the kernel side, it triggers DMA transfers through the PCI to/from our > >>> board memory. > >>> > >>> We followed what pretty much all docs said about direct I/O to user > >>> buffers: > >>> > >>> 1) get_user_pages() (in the current case, it's at most 16 pages at once) > >>> 2) convert to a scatterlist > >>> 3) pci_map_sg > >>> 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually > >>> possible) > >>> 4) A lot of DMA engine handling code, using the dmaengine layer and > >>> virt-dma > >>> 5) wait for transfer complete, in the mean time, go back to (1) to > >>> schedule more work, if any > >>> 6) pci_unmap_sg > >>> 7) for read (card2host) transfer, set_page_dirty_lock > >>> 8) page_cache_release > >>> > >>> In 99,% it works perfectly. > >>> However, I have one userland application where a few pages are not > >>> written by a read (card2host) transfer. > >>> The buffer is memset them to a different value so I can check that > >>> nothing has overwritten them. > >>> > >>> I know (PCI protocol analyser) that the data left our board for the > >>> "right" address (the one set in the sg by pci_map_sg). > >>> I tried reading the data between the pci_unmap_sg and the set_page_dirty, > >>> using > >>> uint32_t *addr = page_address(trans->pages[0]); > >>> dev_warn(>pdev->dev, "val = %x\n", *addr); > >>> and it has the expected value. > >>> But if I try to copy_from_user (using the address coming from userland, > >>> the one passed to get_user_pages), the data has not been written and I > >>> see the memset value. > >>> > >>> New infos: > >>> > >>> The issue happens with IOMMU on or off. > >>> I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or > >>> errors. > >>> > >>> I digged a little bit deeper with my very small understanding of linux mm > >>> and I discovered that: > >>> * we are using transparent huge pages > >>> * the page 'not transferred' are the last few of a huge page > >>> More precisely: > >>> - We have several transfer in flight from the same user buffer > >>> - Each transfer is 16 pages long > >>> - At one point in time, we start transferring from another huge page > >>> (transfers are still in flight from the previous one) > >>> - When a transfer from the previous huge page completes, I dumped at the > >>> mapcount of the pages from the previous transfers, > >>> they are all to 0. The pages are still mapped to dma at this point. > >>> - A get_user_page to the address of the completed transfer returns return > >>> a different struct page * then the on I had. > >>> But this is before I have unmapped/put_page them back. From my > >>> understanding this should not have happened. > >>> > >>> I tried the same code with a kernel 4.5 and encountered the same issue > >>> > >>> Disabling transparent huge pages makes the issue disapear > >>> > >>> Thanks in advance > >> It does look to me as if pages are being migrated, despite being pinned > >> by get_user_pages(): and that would be wrong. Originally I intended > >> to suggest that THP is probably merely the cause of compaction, with > >> compaction causing the page migration. But you posted very interesting > >> details in an earlier mail on 27th April from: > >> > >>> I ran some more tests: > >>> > >>> * Test is OK if transparent
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote: > Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit : > > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > >> > >>> Hi everyone, > >>> > >>> This is a repost from a different address as it seems the previous one > >>> ended in Gmail junk due to a domain error.. > >> linux-kernel is a very high volume list which few are reading: > >> that also will account for your lack of response so far > >> (apart from the indefatigable Alan). > >> > >> I've added linux-mm, and some people from another thread regarding > >> THP and get_user_pages() pins which has been discussed in recent days. > >> > >> Make no mistake, the issue you're raising here is definitely not the > >> same as that one (which is specifically about the new THP refcounting > >> in v4.5+, whereas you're reporting a problem you've seen in both a > >> v3.10-based kernel and in v4.5). But I think their heads are in > >> gear, much more so than mine, and likely to spot something. > >> > >>> I added more info found while blindly debugging the issue. > >>> > >>> Short version: > >>> I'm having an issue with direct DMA transfer from a device to host memory. > >>> It seems some of the data is not transferring to the appropriate page. > >>> > >>> Some more details: > >>> I'm debugging a home made PCI driver for our board (Kalray), attached to > >>> a x86_64 host running centos7 (3.10.0-327.el7.x86_64) > >>> > >>> In the current case, a userland application transfers back and forth data > >>> through read/write operations on a file. > >>> On the kernel side, it triggers DMA transfers through the PCI to/from our > >>> board memory. > >>> > >>> We followed what pretty much all docs said about direct I/O to user > >>> buffers: > >>> > >>> 1) get_user_pages() (in the current case, it's at most 16 pages at once) > >>> 2) convert to a scatterlist > >>> 3) pci_map_sg > >>> 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually > >>> possible) > >>> 4) A lot of DMA engine handling code, using the dmaengine layer and > >>> virt-dma > >>> 5) wait for transfer complete, in the mean time, go back to (1) to > >>> schedule more work, if any > >>> 6) pci_unmap_sg > >>> 7) for read (card2host) transfer, set_page_dirty_lock > >>> 8) page_cache_release > >>> > >>> In 99,% it works perfectly. > >>> However, I have one userland application where a few pages are not > >>> written by a read (card2host) transfer. > >>> The buffer is memset them to a different value so I can check that > >>> nothing has overwritten them. > >>> > >>> I know (PCI protocol analyser) that the data left our board for the > >>> "right" address (the one set in the sg by pci_map_sg). > >>> I tried reading the data between the pci_unmap_sg and the set_page_dirty, > >>> using > >>> uint32_t *addr = page_address(trans->pages[0]); > >>> dev_warn(>pdev->dev, "val = %x\n", *addr); > >>> and it has the expected value. > >>> But if I try to copy_from_user (using the address coming from userland, > >>> the one passed to get_user_pages), the data has not been written and I > >>> see the memset value. > >>> > >>> New infos: > >>> > >>> The issue happens with IOMMU on or off. > >>> I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or > >>> errors. > >>> > >>> I digged a little bit deeper with my very small understanding of linux mm > >>> and I discovered that: > >>> * we are using transparent huge pages > >>> * the page 'not transferred' are the last few of a huge page > >>> More precisely: > >>> - We have several transfer in flight from the same user buffer > >>> - Each transfer is 16 pages long > >>> - At one point in time, we start transferring from another huge page > >>> (transfers are still in flight from the previous one) > >>> - When a transfer from the previous huge page completes, I dumped at the > >>> mapcount of the pages from the previous transfers, > >>> they are all to 0. The pages are still mapped to dma at this point. > >>> - A get_user_page to the address of the completed transfer returns return > >>> a different struct page * then the on I had. > >>> But this is before I have unmapped/put_page them back. From my > >>> understanding this should not have happened. > >>> > >>> I tried the same code with a kernel 4.5 and encountered the same issue > >>> > >>> Disabling transparent huge pages makes the issue disapear > >>> > >>> Thanks in advance > >> It does look to me as if pages are being migrated, despite being pinned > >> by get_user_pages(): and that would be wrong. Originally I intended > >> to suggest that THP is probably merely the cause of compaction, with > >> compaction causing the page migration. But you posted very interesting > >> details in an earlier mail on 27th April from : > >> > >>> I ran some more tests: > >>> > >>> * Test is OK if transparent huge tlb are
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Tue, May 03, 2016 at 12:11:54PM +0200, Jerome Glisse wrote: > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > > > > > Hi everyone, > > > > > > This is a repost from a different address as it seems the previous one > > > ended in Gmail junk due to a domain error.. > > > > linux-kernel is a very high volume list which few are reading: > > that also will account for your lack of response so far > > (apart from the indefatigable Alan). > > > > I've added linux-mm, and some people from another thread regarding > > THP and get_user_pages() pins which has been discussed in recent days. > > > > Make no mistake, the issue you're raising here is definitely not the > > same as that one (which is specifically about the new THP refcounting > > in v4.5+, whereas you're reporting a problem you've seen in both a > > v3.10-based kernel and in v4.5). But I think their heads are in > > gear, much more so than mine, and likely to spot something. > > > > > I added more info found while blindly debugging the issue. > > > > > > Short version: > > > I'm having an issue with direct DMA transfer from a device to host memory. > > > It seems some of the data is not transferring to the appropriate page. > > > > > > Some more details: > > > I'm debugging a home made PCI driver for our board (Kalray), attached to > > > a x86_64 host running centos7 (3.10.0-327.el7.x86_64) > > > > > > In the current case, a userland application transfers back and forth data > > > through read/write operations on a file. > > > On the kernel side, it triggers DMA transfers through the PCI to/from our > > > board memory. > > > > > > We followed what pretty much all docs said about direct I/O to user > > > buffers: > > > > > > 1) get_user_pages() (in the current case, it's at most 16 pages at once) > > > 2) convert to a scatterlist > > > 3) pci_map_sg > > > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually > > > possible) > > > 4) A lot of DMA engine handling code, using the dmaengine layer and > > > virt-dma > > > 5) wait for transfer complete, in the mean time, go back to (1) to > > > schedule more work, if any > > > 6) pci_unmap_sg > > > 7) for read (card2host) transfer, set_page_dirty_lock > > > 8) page_cache_release > > > > > > In 99,% it works perfectly. > > > However, I have one userland application where a few pages are not > > > written by a read (card2host) transfer. > > > The buffer is memset them to a different value so I can check that > > > nothing has overwritten them. > > > > > > I know (PCI protocol analyser) that the data left our board for the > > > "right" address (the one set in the sg by pci_map_sg). > > > I tried reading the data between the pci_unmap_sg and the set_page_dirty, > > > using > > > uint32_t *addr = page_address(trans->pages[0]); > > > dev_warn(>pdev->dev, "val = %x\n", *addr); > > > and it has the expected value. > > > But if I try to copy_from_user (using the address coming from userland, > > > the one passed to get_user_pages), the data has not been written and I > > > see the memset value. > > > > > > New infos: > > > > > > The issue happens with IOMMU on or off. > > > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or > > > errors. > > > > > > I digged a little bit deeper with my very small understanding of linux mm > > > and I discovered that: > > > * we are using transparent huge pages > > > * the page 'not transferred' are the last few of a huge page > > > More precisely: > > > - We have several transfer in flight from the same user buffer > > > - Each transfer is 16 pages long > > > - At one point in time, we start transferring from another huge page > > > (transfers are still in flight from the previous one) > > > - When a transfer from the previous huge page completes, I dumped at the > > > mapcount of the pages from the previous transfers, > > > they are all to 0. The pages are still mapped to dma at this point. > > > - A get_user_page to the address of the completed transfer returns return > > > a different struct page * then the on I had. > > > But this is before I have unmapped/put_page them back. From my > > > understanding this should not have happened. > > > > > > I tried the same code with a kernel 4.5 and encountered the same issue > > > > > > Disabling transparent huge pages makes the issue disapear > > > > > > Thanks in advance > > > > It does look to me as if pages are being migrated, despite being pinned > > by get_user_pages(): and that would be wrong. Originally I intended > > to suggest that THP is probably merely the cause of compaction, with > > compaction causing the page migration. But you posted very interesting > > details in an earlier mail on 27th April from: > > > > > I ran some more tests: > > > > > > * Test is OK if transparent huge tlb are disabled > > > > > > * For all the page where
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Tue, May 03, 2016 at 12:11:54PM +0200, Jerome Glisse wrote: > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > > > > > Hi everyone, > > > > > > This is a repost from a different address as it seems the previous one > > > ended in Gmail junk due to a domain error.. > > > > linux-kernel is a very high volume list which few are reading: > > that also will account for your lack of response so far > > (apart from the indefatigable Alan). > > > > I've added linux-mm, and some people from another thread regarding > > THP and get_user_pages() pins which has been discussed in recent days. > > > > Make no mistake, the issue you're raising here is definitely not the > > same as that one (which is specifically about the new THP refcounting > > in v4.5+, whereas you're reporting a problem you've seen in both a > > v3.10-based kernel and in v4.5). But I think their heads are in > > gear, much more so than mine, and likely to spot something. > > > > > I added more info found while blindly debugging the issue. > > > > > > Short version: > > > I'm having an issue with direct DMA transfer from a device to host memory. > > > It seems some of the data is not transferring to the appropriate page. > > > > > > Some more details: > > > I'm debugging a home made PCI driver for our board (Kalray), attached to > > > a x86_64 host running centos7 (3.10.0-327.el7.x86_64) > > > > > > In the current case, a userland application transfers back and forth data > > > through read/write operations on a file. > > > On the kernel side, it triggers DMA transfers through the PCI to/from our > > > board memory. > > > > > > We followed what pretty much all docs said about direct I/O to user > > > buffers: > > > > > > 1) get_user_pages() (in the current case, it's at most 16 pages at once) > > > 2) convert to a scatterlist > > > 3) pci_map_sg > > > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually > > > possible) > > > 4) A lot of DMA engine handling code, using the dmaengine layer and > > > virt-dma > > > 5) wait for transfer complete, in the mean time, go back to (1) to > > > schedule more work, if any > > > 6) pci_unmap_sg > > > 7) for read (card2host) transfer, set_page_dirty_lock > > > 8) page_cache_release > > > > > > In 99,% it works perfectly. > > > However, I have one userland application where a few pages are not > > > written by a read (card2host) transfer. > > > The buffer is memset them to a different value so I can check that > > > nothing has overwritten them. > > > > > > I know (PCI protocol analyser) that the data left our board for the > > > "right" address (the one set in the sg by pci_map_sg). > > > I tried reading the data between the pci_unmap_sg and the set_page_dirty, > > > using > > > uint32_t *addr = page_address(trans->pages[0]); > > > dev_warn(>pdev->dev, "val = %x\n", *addr); > > > and it has the expected value. > > > But if I try to copy_from_user (using the address coming from userland, > > > the one passed to get_user_pages), the data has not been written and I > > > see the memset value. > > > > > > New infos: > > > > > > The issue happens with IOMMU on or off. > > > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or > > > errors. > > > > > > I digged a little bit deeper with my very small understanding of linux mm > > > and I discovered that: > > > * we are using transparent huge pages > > > * the page 'not transferred' are the last few of a huge page > > > More precisely: > > > - We have several transfer in flight from the same user buffer > > > - Each transfer is 16 pages long > > > - At one point in time, we start transferring from another huge page > > > (transfers are still in flight from the previous one) > > > - When a transfer from the previous huge page completes, I dumped at the > > > mapcount of the pages from the previous transfers, > > > they are all to 0. The pages are still mapped to dma at this point. > > > - A get_user_page to the address of the completed transfer returns return > > > a different struct page * then the on I had. > > > But this is before I have unmapped/put_page them back. From my > > > understanding this should not have happened. > > > > > > I tried the same code with a kernel 4.5 and encountered the same issue > > > > > > Disabling transparent huge pages makes the issue disapear > > > > > > Thanks in advance > > > > It does look to me as if pages are being migrated, despite being pinned > > by get_user_pages(): and that would be wrong. Originally I intended > > to suggest that THP is probably merely the cause of compaction, with > > compaction causing the page migration. But you posted very interesting > > details in an earlier mail on 27th April from : > > > > > I ran some more tests: > > > > > > * Test is OK if transparent huge tlb are disabled > > > > > > * For all the page where data are not
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > > > Hi everyone, > > > > This is a repost from a different address as it seems the previous one > > ended in Gmail junk due to a domain error.. > > linux-kernel is a very high volume list which few are reading: > that also will account for your lack of response so far > (apart from the indefatigable Alan). > > I've added linux-mm, and some people from another thread regarding > THP and get_user_pages() pins which has been discussed in recent days. > > Make no mistake, the issue you're raising here is definitely not the > same as that one (which is specifically about the new THP refcounting > in v4.5+, whereas you're reporting a problem you've seen in both a > v3.10-based kernel and in v4.5). But I think their heads are in > gear, much more so than mine, and likely to spot something. > > > I added more info found while blindly debugging the issue. > > > > Short version: > > I'm having an issue with direct DMA transfer from a device to host memory. > > It seems some of the data is not transferring to the appropriate page. > > > > Some more details: > > I'm debugging a home made PCI driver for our board (Kalray), attached to a > > x86_64 host running centos7 (3.10.0-327.el7.x86_64) > > > > In the current case, a userland application transfers back and forth data > > through read/write operations on a file. > > On the kernel side, it triggers DMA transfers through the PCI to/from our > > board memory. > > > > We followed what pretty much all docs said about direct I/O to user buffers: > > > > 1) get_user_pages() (in the current case, it's at most 16 pages at once) > > 2) convert to a scatterlist > > 3) pci_map_sg > > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible) > > 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma > > 5) wait for transfer complete, in the mean time, go back to (1) to schedule > > more work, if any > > 6) pci_unmap_sg > > 7) for read (card2host) transfer, set_page_dirty_lock > > 8) page_cache_release > > > > In 99,% it works perfectly. > > However, I have one userland application where a few pages are not written > > by a read (card2host) transfer. > > The buffer is memset them to a different value so I can check that nothing > > has overwritten them. > > > > I know (PCI protocol analyser) that the data left our board for the "right" > > address (the one set in the sg by pci_map_sg). > > I tried reading the data between the pci_unmap_sg and the set_page_dirty, > > using > > uint32_t *addr = page_address(trans->pages[0]); > > dev_warn(>pdev->dev, "val = %x\n", *addr); > > and it has the expected value. > > But if I try to copy_from_user (using the address coming from userland, the > > one passed to get_user_pages), the data has not been written and I see the > > memset value. > > > > New infos: > > > > The issue happens with IOMMU on or off. > > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or > > errors. > > > > I digged a little bit deeper with my very small understanding of linux mm > > and I discovered that: > > * we are using transparent huge pages > > * the page 'not transferred' are the last few of a huge page > > More precisely: > > - We have several transfer in flight from the same user buffer > > - Each transfer is 16 pages long > > - At one point in time, we start transferring from another huge page > > (transfers are still in flight from the previous one) > > - When a transfer from the previous huge page completes, I dumped at the > > mapcount of the pages from the previous transfers, > > they are all to 0. The pages are still mapped to dma at this point. > > - A get_user_page to the address of the completed transfer returns return a > > different struct page * then the on I had. > > But this is before I have unmapped/put_page them back. From my > > understanding this should not have happened. > > > > I tried the same code with a kernel 4.5 and encountered the same issue > > > > Disabling transparent huge pages makes the issue disapear > > > > Thanks in advance > > It does look to me as if pages are being migrated, despite being pinned > by get_user_pages(): and that would be wrong. Originally I intended > to suggest that THP is probably merely the cause of compaction, with > compaction causing the page migration. But you posted very interesting > details in an earlier mail on 27th April from: > > > I ran some more tests: > > > > * Test is OK if transparent huge tlb are disabled > > > > * For all the page where data are not transfered, and only those pages, a > > call to get_user_page(user vaddr) just before dma_unmap_sg returns a > > different page from the original one. > > [436477.927279] mppa :03:00.0: org_page= ea0009f60080 cur page = > > ea00074e0080 > > [436477.927298]
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote: > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > > > Hi everyone, > > > > This is a repost from a different address as it seems the previous one > > ended in Gmail junk due to a domain error.. > > linux-kernel is a very high volume list which few are reading: > that also will account for your lack of response so far > (apart from the indefatigable Alan). > > I've added linux-mm, and some people from another thread regarding > THP and get_user_pages() pins which has been discussed in recent days. > > Make no mistake, the issue you're raising here is definitely not the > same as that one (which is specifically about the new THP refcounting > in v4.5+, whereas you're reporting a problem you've seen in both a > v3.10-based kernel and in v4.5). But I think their heads are in > gear, much more so than mine, and likely to spot something. > > > I added more info found while blindly debugging the issue. > > > > Short version: > > I'm having an issue with direct DMA transfer from a device to host memory. > > It seems some of the data is not transferring to the appropriate page. > > > > Some more details: > > I'm debugging a home made PCI driver for our board (Kalray), attached to a > > x86_64 host running centos7 (3.10.0-327.el7.x86_64) > > > > In the current case, a userland application transfers back and forth data > > through read/write operations on a file. > > On the kernel side, it triggers DMA transfers through the PCI to/from our > > board memory. > > > > We followed what pretty much all docs said about direct I/O to user buffers: > > > > 1) get_user_pages() (in the current case, it's at most 16 pages at once) > > 2) convert to a scatterlist > > 3) pci_map_sg > > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible) > > 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma > > 5) wait for transfer complete, in the mean time, go back to (1) to schedule > > more work, if any > > 6) pci_unmap_sg > > 7) for read (card2host) transfer, set_page_dirty_lock > > 8) page_cache_release > > > > In 99,% it works perfectly. > > However, I have one userland application where a few pages are not written > > by a read (card2host) transfer. > > The buffer is memset them to a different value so I can check that nothing > > has overwritten them. > > > > I know (PCI protocol analyser) that the data left our board for the "right" > > address (the one set in the sg by pci_map_sg). > > I tried reading the data between the pci_unmap_sg and the set_page_dirty, > > using > > uint32_t *addr = page_address(trans->pages[0]); > > dev_warn(>pdev->dev, "val = %x\n", *addr); > > and it has the expected value. > > But if I try to copy_from_user (using the address coming from userland, the > > one passed to get_user_pages), the data has not been written and I see the > > memset value. > > > > New infos: > > > > The issue happens with IOMMU on or off. > > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or > > errors. > > > > I digged a little bit deeper with my very small understanding of linux mm > > and I discovered that: > > * we are using transparent huge pages > > * the page 'not transferred' are the last few of a huge page > > More precisely: > > - We have several transfer in flight from the same user buffer > > - Each transfer is 16 pages long > > - At one point in time, we start transferring from another huge page > > (transfers are still in flight from the previous one) > > - When a transfer from the previous huge page completes, I dumped at the > > mapcount of the pages from the previous transfers, > > they are all to 0. The pages are still mapped to dma at this point. > > - A get_user_page to the address of the completed transfer returns return a > > different struct page * then the on I had. > > But this is before I have unmapped/put_page them back. From my > > understanding this should not have happened. > > > > I tried the same code with a kernel 4.5 and encountered the same issue > > > > Disabling transparent huge pages makes the issue disapear > > > > Thanks in advance > > It does look to me as if pages are being migrated, despite being pinned > by get_user_pages(): and that would be wrong. Originally I intended > to suggest that THP is probably merely the cause of compaction, with > compaction causing the page migration. But you posted very interesting > details in an earlier mail on 27th April from : > > > I ran some more tests: > > > > * Test is OK if transparent huge tlb are disabled > > > > * For all the page where data are not transfered, and only those pages, a > > call to get_user_page(user vaddr) just before dma_unmap_sg returns a > > different page from the original one. > > [436477.927279] mppa :03:00.0: org_page= ea0009f60080 cur page = > > ea00074e0080 > > [436477.927298] page:ea0009f60080
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > Hi everyone, > > This is a repost from a different address as it seems the previous one ended > in Gmail junk due to a domain error.. linux-kernel is a very high volume list which few are reading: that also will account for your lack of response so far (apart from the indefatigable Alan). I've added linux-mm, and some people from another thread regarding THP and get_user_pages() pins which has been discussed in recent days. Make no mistake, the issue you're raising here is definitely not the same as that one (which is specifically about the new THP refcounting in v4.5+, whereas you're reporting a problem you've seen in both a v3.10-based kernel and in v4.5). But I think their heads are in gear, much more so than mine, and likely to spot something. > I added more info found while blindly debugging the issue. > > Short version: > I'm having an issue with direct DMA transfer from a device to host memory. > It seems some of the data is not transferring to the appropriate page. > > Some more details: > I'm debugging a home made PCI driver for our board (Kalray), attached to a > x86_64 host running centos7 (3.10.0-327.el7.x86_64) > > In the current case, a userland application transfers back and forth data > through read/write operations on a file. > On the kernel side, it triggers DMA transfers through the PCI to/from our > board memory. > > We followed what pretty much all docs said about direct I/O to user buffers: > > 1) get_user_pages() (in the current case, it's at most 16 pages at once) > 2) convert to a scatterlist > 3) pci_map_sg > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible) > 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma > 5) wait for transfer complete, in the mean time, go back to (1) to schedule > more work, if any > 6) pci_unmap_sg > 7) for read (card2host) transfer, set_page_dirty_lock > 8) page_cache_release > > In 99,% it works perfectly. > However, I have one userland application where a few pages are not written by > a read (card2host) transfer. > The buffer is memset them to a different value so I can check that nothing > has overwritten them. > > I know (PCI protocol analyser) that the data left our board for the "right" > address (the one set in the sg by pci_map_sg). > I tried reading the data between the pci_unmap_sg and the set_page_dirty, > using > uint32_t *addr = page_address(trans->pages[0]); > dev_warn(>pdev->dev, "val = %x\n", *addr); > and it has the expected value. > But if I try to copy_from_user (using the address coming from userland, the > one passed to get_user_pages), the data has not been written and I see the > memset value. > > New infos: > > The issue happens with IOMMU on or off. > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or errors. > > I digged a little bit deeper with my very small understanding of linux mm and > I discovered that: > * we are using transparent huge pages > * the page 'not transferred' are the last few of a huge page > More precisely: > - We have several transfer in flight from the same user buffer > - Each transfer is 16 pages long > - At one point in time, we start transferring from another huge page > (transfers are still in flight from the previous one) > - When a transfer from the previous huge page completes, I dumped at the > mapcount of the pages from the previous transfers, > they are all to 0. The pages are still mapped to dma at this point. > - A get_user_page to the address of the completed transfer returns return a > different struct page * then the on I had. > But this is before I have unmapped/put_page them back. From my understanding > this should not have happened. > > I tried the same code with a kernel 4.5 and encountered the same issue > > Disabling transparent huge pages makes the issue disapear > > Thanks in advance It does look to me as if pages are being migrated, despite being pinned by get_user_pages(): and that would be wrong. Originally I intended to suggest that THP is probably merely the cause of compaction, with compaction causing the page migration. But you posted very interesting details in an earlier mail on 27th April from: > I ran some more tests: > > * Test is OK if transparent huge tlb are disabled > > * For all the page where data are not transfered, and only those pages, a > call to get_user_page(user vaddr) just before dma_unmap_sg returns a > different page from the original one. > [436477.927279] mppa :03:00.0: org_page= ea0009f60080 cur page = > ea00074e0080 > [436477.927298] page:ea0009f60080 count:0 mapcount:1 mapping: > (null) index:0x2 > [436477.927314] page flags: 0x2f8000(tail) > [436477.927354] page dumped because: org_page > [436477.927369] page:ea00074e0080 count:0 mapcount:1 mapping: > (null) index:0x2 > [436477.927382]
Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?
On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote: > Hi everyone, > > This is a repost from a different address as it seems the previous one ended > in Gmail junk due to a domain error.. linux-kernel is a very high volume list which few are reading: that also will account for your lack of response so far (apart from the indefatigable Alan). I've added linux-mm, and some people from another thread regarding THP and get_user_pages() pins which has been discussed in recent days. Make no mistake, the issue you're raising here is definitely not the same as that one (which is specifically about the new THP refcounting in v4.5+, whereas you're reporting a problem you've seen in both a v3.10-based kernel and in v4.5). But I think their heads are in gear, much more so than mine, and likely to spot something. > I added more info found while blindly debugging the issue. > > Short version: > I'm having an issue with direct DMA transfer from a device to host memory. > It seems some of the data is not transferring to the appropriate page. > > Some more details: > I'm debugging a home made PCI driver for our board (Kalray), attached to a > x86_64 host running centos7 (3.10.0-327.el7.x86_64) > > In the current case, a userland application transfers back and forth data > through read/write operations on a file. > On the kernel side, it triggers DMA transfers through the PCI to/from our > board memory. > > We followed what pretty much all docs said about direct I/O to user buffers: > > 1) get_user_pages() (in the current case, it's at most 16 pages at once) > 2) convert to a scatterlist > 3) pci_map_sg > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible) > 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma > 5) wait for transfer complete, in the mean time, go back to (1) to schedule > more work, if any > 6) pci_unmap_sg > 7) for read (card2host) transfer, set_page_dirty_lock > 8) page_cache_release > > In 99,% it works perfectly. > However, I have one userland application where a few pages are not written by > a read (card2host) transfer. > The buffer is memset them to a different value so I can check that nothing > has overwritten them. > > I know (PCI protocol analyser) that the data left our board for the "right" > address (the one set in the sg by pci_map_sg). > I tried reading the data between the pci_unmap_sg and the set_page_dirty, > using > uint32_t *addr = page_address(trans->pages[0]); > dev_warn(>pdev->dev, "val = %x\n", *addr); > and it has the expected value. > But if I try to copy_from_user (using the address coming from userland, the > one passed to get_user_pages), the data has not been written and I see the > memset value. > > New infos: > > The issue happens with IOMMU on or off. > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or errors. > > I digged a little bit deeper with my very small understanding of linux mm and > I discovered that: > * we are using transparent huge pages > * the page 'not transferred' are the last few of a huge page > More precisely: > - We have several transfer in flight from the same user buffer > - Each transfer is 16 pages long > - At one point in time, we start transferring from another huge page > (transfers are still in flight from the previous one) > - When a transfer from the previous huge page completes, I dumped at the > mapcount of the pages from the previous transfers, > they are all to 0. The pages are still mapped to dma at this point. > - A get_user_page to the address of the completed transfer returns return a > different struct page * then the on I had. > But this is before I have unmapped/put_page them back. From my understanding > this should not have happened. > > I tried the same code with a kernel 4.5 and encountered the same issue > > Disabling transparent huge pages makes the issue disapear > > Thanks in advance It does look to me as if pages are being migrated, despite being pinned by get_user_pages(): and that would be wrong. Originally I intended to suggest that THP is probably merely the cause of compaction, with compaction causing the page migration. But you posted very interesting details in an earlier mail on 27th April from : > I ran some more tests: > > * Test is OK if transparent huge tlb are disabled > > * For all the page where data are not transfered, and only those pages, a > call to get_user_page(user vaddr) just before dma_unmap_sg returns a > different page from the original one. > [436477.927279] mppa :03:00.0: org_page= ea0009f60080 cur page = > ea00074e0080 > [436477.927298] page:ea0009f60080 count:0 mapcount:1 mapping: > (null) index:0x2 > [436477.927314] page flags: 0x2f8000(tail) > [436477.927354] page dumped because: org_page > [436477.927369] page:ea00074e0080 count:0 mapcount:1 mapping: > (null) index:0x2 > [436477.927382] page flags:
[Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Hi everyone, This is a repost from a different address as it seems the previous one ended in Gmail junk due to a domain error.. I added more info found while blindly debugging the issue. Short version: I'm having an issue with direct DMA transfer from a device to host memory. It seems some of the data is not transferring to the appropriate page. Some more details: I'm debugging a home made PCI driver for our board (Kalray), attached to a x86_64 host running centos7 (3.10.0-327.el7.x86_64) In the current case, a userland application transfers back and forth data through read/write operations on a file. On the kernel side, it triggers DMA transfers through the PCI to/from our board memory. We followed what pretty much all docs said about direct I/O to user buffers: 1) get_user_pages() (in the current case, it's at most 16 pages at once) 2) convert to a scatterlist 3) pci_map_sg 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible) 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma 5) wait for transfer complete, in the mean time, go back to (1) to schedule more work, if any 6) pci_unmap_sg 7) for read (card2host) transfer, set_page_dirty_lock 8) page_cache_release In 99,% it works perfectly. However, I have one userland application where a few pages are not written by a read (card2host) transfer. The buffer is memset them to a different value so I can check that nothing has overwritten them. I know (PCI protocol analyser) that the data left our board for the "right" address (the one set in the sg by pci_map_sg). I tried reading the data between the pci_unmap_sg and the set_page_dirty, using uint32_t *addr = page_address(trans->pages[0]); dev_warn(>pdev->dev, "val = %x\n", *addr); and it has the expected value. But if I try to copy_from_user (using the address coming from userland, the one passed to get_user_pages), the data has not been written and I see the memset value. New infos: The issue happens with IOMMU on or off. I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or errors. I digged a little bit deeper with my very small understanding of linux mm and I discovered that: * we are using transparent huge pages * the page 'not transferred' are the last few of a huge page More precisely: - We have several transfer in flight from the same user buffer - Each transfer is 16 pages long - At one point in time, we start transferring from another huge page (transfers are still in flight from the previous one) - When a transfer from the previous huge page completes, I dumped at the mapcount of the pages from the previous transfers, they are all to 0. The pages are still mapped to dma at this point. - A get_user_page to the address of the completed transfer returns return a different struct page * then the on I had. But this is before I have unmapped/put_page them back. From my understanding this should not have happened. I tried the same code with a kernel 4.5 and encountered the same issue Disabling transparent huge pages makes the issue disapear Thanks in advance Nicolas
[Question] Missing data after DMA read transfer - mm issue with transparent huge page?
Hi everyone, This is a repost from a different address as it seems the previous one ended in Gmail junk due to a domain error.. I added more info found while blindly debugging the issue. Short version: I'm having an issue with direct DMA transfer from a device to host memory. It seems some of the data is not transferring to the appropriate page. Some more details: I'm debugging a home made PCI driver for our board (Kalray), attached to a x86_64 host running centos7 (3.10.0-327.el7.x86_64) In the current case, a userland application transfers back and forth data through read/write operations on a file. On the kernel side, it triggers DMA transfers through the PCI to/from our board memory. We followed what pretty much all docs said about direct I/O to user buffers: 1) get_user_pages() (in the current case, it's at most 16 pages at once) 2) convert to a scatterlist 3) pci_map_sg 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible) 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma 5) wait for transfer complete, in the mean time, go back to (1) to schedule more work, if any 6) pci_unmap_sg 7) for read (card2host) transfer, set_page_dirty_lock 8) page_cache_release In 99,% it works perfectly. However, I have one userland application where a few pages are not written by a read (card2host) transfer. The buffer is memset them to a different value so I can check that nothing has overwritten them. I know (PCI protocol analyser) that the data left our board for the "right" address (the one set in the sg by pci_map_sg). I tried reading the data between the pci_unmap_sg and the set_page_dirty, using uint32_t *addr = page_address(trans->pages[0]); dev_warn(>pdev->dev, "val = %x\n", *addr); and it has the expected value. But if I try to copy_from_user (using the address coming from userland, the one passed to get_user_pages), the data has not been written and I see the memset value. New infos: The issue happens with IOMMU on or off. I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or errors. I digged a little bit deeper with my very small understanding of linux mm and I discovered that: * we are using transparent huge pages * the page 'not transferred' are the last few of a huge page More precisely: - We have several transfer in flight from the same user buffer - Each transfer is 16 pages long - At one point in time, we start transferring from another huge page (transfers are still in flight from the previous one) - When a transfer from the previous huge page completes, I dumped at the mapcount of the pages from the previous transfers, they are all to 0. The pages are still mapped to dma at this point. - A get_user_page to the address of the completed transfer returns return a different struct page * then the on I had. But this is before I have unmapped/put_page them back. From my understanding this should not have happened. I tried the same code with a kernel 4.5 and encountered the same issue Disabling transparent huge pages makes the issue disapear Thanks in advance Nicolas