Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Nicolas Morey-Chaisemartin


Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit :
> On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote:
>> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit :
>>> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote:
[...]
 With transparent_hugepage=never I can't see the bug anymore.

>>> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
>>> (does not apply to 3.10) and without transparent_hugepage=never
>>>
>>> Jérôme
>> Fails with 4.5 + this patch and with 4.5 + this patch + yours
>>
> There must be some bug in your code, we have upstream user that works
> fine with the above combination (see drivers/vfio/vfio_iommu_type1.c)
> i suspect you might be releasing the page pin too early (put_page()).
In my previous tests, I checked the page before calling put_page and it has 
already changed.
And I also checked that there is not multiple transfers in a single page at 
once.
So I doubt it's that.
>
> If you really believe it is bug upstream we would need a dumb kernel
> module that does gup like you do and that shows the issue. Right now
> looking at code (assuming above patches applied) i can't see anything
> that can go wrong with THP.

The issue is that I doubt I'll be able to do that. We have had code running in 
production for at least a year without the issue showing up and now a single 
test shows this.
And some tweak to the test (meaning memory footprint in the user space) can 
make the problem disappear.

Is there a way to track what is happening to the THP? From the looks of it, the 
refcount are changed behind my back? Would kgdb with watch point work on this?
Is there a less painful way?

Thanks

Nicolas



Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Nicolas Morey-Chaisemartin


Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit :
> On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote:
>> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit :
>>> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote:
[...]
 With transparent_hugepage=never I can't see the bug anymore.

>>> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
>>> (does not apply to 3.10) and without transparent_hugepage=never
>>>
>>> Jérôme
>> Fails with 4.5 + this patch and with 4.5 + this patch + yours
>>
> There must be some bug in your code, we have upstream user that works
> fine with the above combination (see drivers/vfio/vfio_iommu_type1.c)
> i suspect you might be releasing the page pin too early (put_page()).
In my previous tests, I checked the page before calling put_page and it has 
already changed.
And I also checked that there is not multiple transfers in a single page at 
once.
So I doubt it's that.
>
> If you really believe it is bug upstream we would need a dumb kernel
> module that does gup like you do and that shows the issue. Right now
> looking at code (assuming above patches applied) i can't see anything
> that can go wrong with THP.

The issue is that I doubt I'll be able to do that. We have had code running in 
production for at least a year without the issue showing up and now a single 
test shows this.
And some tweak to the test (meaning memory footprint in the user space) can 
make the problem disappear.

Is there a way to track what is happening to the THP? From the looks of it, the 
refcount are changed behind my back? Would kgdb with watch point work on this?
Is there a less painful way?

Thanks

Nicolas



Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Andrea Arcangeli
Hello Nicolas,

On Thu, May 12, 2016 at 05:31:52PM +0200, Nicolas Morey-Chaisemartin wrote:
> 
> 
> Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit :
> > On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote:
> >> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit :
> >>> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin 
> >>> wrote:
> [...]
>  With transparent_hugepage=never I can't see the bug anymore.
> 
> >>> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
> >>> (does not apply to 3.10) and without transparent_hugepage=never
> >>>
> >>> Jérôme
> >> Fails with 4.5 + this patch and with 4.5 + this patch + yours
> >>
> > There must be some bug in your code, we have upstream user that works
> > fine with the above combination (see drivers/vfio/vfio_iommu_type1.c)
> > i suspect you might be releasing the page pin too early (put_page()).
> In my previous tests, I checked the page before calling put_page and it has 
> already changed.
> And I also checked that there is not multiple transfers in a single page at 
> once.
> So I doubt it's that.
> >
> > If you really believe it is bug upstream we would need a dumb kernel
> > module that does gup like you do and that shows the issue. Right now
> > looking at code (assuming above patches applied) i can't see anything
> > that can go wrong with THP.
> 
> The issue is that I doubt I'll be able to do that. We have had code running 
> in production for at least a year without the issue showing up and now a 
> single test shows this.
> And some tweak to the test (meaning memory footprint in the user space) can 
> make the problem disappear.
> 
> Is there a way to track what is happening to the THP? From the looks of it, 
> the refcount are changed behind my back? Would kgdb with watch point work on 
> this?
> Is there a less painful way?

Do you use fork()?

If you have threads and your DMA I/O granularity is smaller than
PAGE_SIZE, and a thread of the application in parent or child is
writing to another part of the page, the I/O can get lost (worse, it
doesn't get really lost but it goes to the child by mistake, instead
of sticking to the "mm" where you executed get_user_pages). This is
practically a bug in fork() but it's known. It can affect any app that
uses get_user_pages/O_DIRECT, fork() and uses thread and the I/O
granularity is smaller than PAGE_SIZE.

The same bug cannot happen with KSM or other things that can wrprotect
a page out of app control, because all things out of app control
checks there are no page pins before wrprotecting the page. So it's up
to the app to control "fork()".

To fix it, you should do one of: 1) use MADV_DONTFORK on the pinned
region, 2) prevent fork to run while you've pins taken with
get_user_pages or anyway while get_user_pages may be running
concurrently, 3) use a PAGE_SIZE I/O granularity and/or prevent the
threads to write to the other part of the page while DMA is running.

I'm not aware of other issues that could screw with page pins with THP
on kernels <=4.4, if there were, everything should fall apart
including O_DIRECT and qemu cache=none. The only issue I'm aware of
that can cause DMA to get lost with page pins is the aforementioned
one.

To debug it further, I would suggest to start by searching for "fork"
calls, and adding MADV_DONTFORK to the pinned region if there's any
fork() in your testcase.

Without being allowed to see the source there's not much else we can
do considering there's no sign of unknown bugs in this area in kernels
<=4.4.

All there is, is the known bug above, but apps that could be affected
by it, actively avoid it by using MADV_DONTFORK like with qemu
cache=none.

Thanks,
Andrea


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Andrea Arcangeli
Hello Nicolas,

On Thu, May 12, 2016 at 05:31:52PM +0200, Nicolas Morey-Chaisemartin wrote:
> 
> 
> Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit :
> > On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote:
> >> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit :
> >>> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin 
> >>> wrote:
> [...]
>  With transparent_hugepage=never I can't see the bug anymore.
> 
> >>> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
> >>> (does not apply to 3.10) and without transparent_hugepage=never
> >>>
> >>> Jérôme
> >> Fails with 4.5 + this patch and with 4.5 + this patch + yours
> >>
> > There must be some bug in your code, we have upstream user that works
> > fine with the above combination (see drivers/vfio/vfio_iommu_type1.c)
> > i suspect you might be releasing the page pin too early (put_page()).
> In my previous tests, I checked the page before calling put_page and it has 
> already changed.
> And I also checked that there is not multiple transfers in a single page at 
> once.
> So I doubt it's that.
> >
> > If you really believe it is bug upstream we would need a dumb kernel
> > module that does gup like you do and that shows the issue. Right now
> > looking at code (assuming above patches applied) i can't see anything
> > that can go wrong with THP.
> 
> The issue is that I doubt I'll be able to do that. We have had code running 
> in production for at least a year without the issue showing up and now a 
> single test shows this.
> And some tweak to the test (meaning memory footprint in the user space) can 
> make the problem disappear.
> 
> Is there a way to track what is happening to the THP? From the looks of it, 
> the refcount are changed behind my back? Would kgdb with watch point work on 
> this?
> Is there a less painful way?

Do you use fork()?

If you have threads and your DMA I/O granularity is smaller than
PAGE_SIZE, and a thread of the application in parent or child is
writing to another part of the page, the I/O can get lost (worse, it
doesn't get really lost but it goes to the child by mistake, instead
of sticking to the "mm" where you executed get_user_pages). This is
practically a bug in fork() but it's known. It can affect any app that
uses get_user_pages/O_DIRECT, fork() and uses thread and the I/O
granularity is smaller than PAGE_SIZE.

The same bug cannot happen with KSM or other things that can wrprotect
a page out of app control, because all things out of app control
checks there are no page pins before wrprotecting the page. So it's up
to the app to control "fork()".

To fix it, you should do one of: 1) use MADV_DONTFORK on the pinned
region, 2) prevent fork to run while you've pins taken with
get_user_pages or anyway while get_user_pages may be running
concurrently, 3) use a PAGE_SIZE I/O granularity and/or prevent the
threads to write to the other part of the page while DMA is running.

I'm not aware of other issues that could screw with page pins with THP
on kernels <=4.4, if there were, everything should fall apart
including O_DIRECT and qemu cache=none. The only issue I'm aware of
that can cause DMA to get lost with page pins is the aforementioned
one.

To debug it further, I would suggest to start by searching for "fork"
calls, and adding MADV_DONTFORK to the pinned region if there's any
fork() in your testcase.

Without being allowed to see the source there's not much else we can
do considering there's no sign of unknown bugs in this area in kernels
<=4.4.

All there is, is the known bug above, but apps that could be affected
by it, actively avoid it by using MADV_DONTFORK like with qemu
cache=none.

Thanks,
Andrea


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Jerome Glisse
On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote:
> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit :
> > On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote:
> >>
> >> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit :
> >>> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin 
> >>> wrote:
>  Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin 
> > wrote:
> >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>  On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
>  [...]
> >> Hi,
> >>
> >> I backported the patch to 3.10 (had to copy paste pmd_protnone 
> >> defitinition from 4.5) and it's working !
> >> I'll open a ticket in Redhat tracker to try and get this fixed in 
> >> RHEL7.
> >>
> >> I have a dumb question though: how can we end up in numa/misplaced 
> >> memory code on a single socket system?
> >>
> > This patch is not a fix, do you see bug message in kernel log ? Because 
> > if
> > you do that it means we have a bigger issue.
> >
> > You did not answer one of my previous question, do you set 
> > get_user_pages
> > with write = 1 as a paremeter ?
> >
> > Also it would be a lot easier if you were testing with lastest 4.6 or 
> > 4.5
> > not RHEL kernel as they are far appart and what might looks like same 
> > issue
> > on both might be totaly different bugs.
> >
> > If you only really care about RHEL kernel then open a bug with Red Hat 
> > and
> > you can add me in bug-cc 
> >
> > Cheers,
> > Jérôme
>  I finally managed to get a proper setup.
>  I build a vanilla 4.5 kernel from git tree using the Centos7 config, my 
>  test fails as usual.
>  I applied your patch, rebuild => still fails and no new messages in 
>  dmesg.
> 
>  Now that I don't have to go through the RPM repackaging, I can try out 
>  things much quicker if you have any ideas.
> 
> >>> Still an issue if you boot with transparent_hugepage=never ?
> >>>
> >>> Also to simplify investigation force write to 1 all the time no matter 
> >>> what.
> >>>
> >>> Cheers,
> >>> Jérôme
> >> With transparent_hugepage=never I can't see the bug anymore.
> >>
> > Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
> > (does not apply to 3.10) and without transparent_hugepage=never
> >
> > Jérôme
> 
> Fails with 4.5 + this patch and with 4.5 + this patch + yours
> 

There must be some bug in your code, we have upstream user that works
fine with the above combination (see drivers/vfio/vfio_iommu_type1.c)
i suspect you might be releasing the page pin too early (put_page()).

If you really believe it is bug upstream we would need a dumb kernel
module that does gup like you do and that shows the issue. Right now
looking at code (assuming above patches applied) i can't see anything
that can go wrong with THP.

Cheers,
Jérôme


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Jerome Glisse
On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote:
> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit :
> > On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote:
> >>
> >> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit :
> >>> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin 
> >>> wrote:
>  Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin 
> > wrote:
> >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>  On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
>  [...]
> >> Hi,
> >>
> >> I backported the patch to 3.10 (had to copy paste pmd_protnone 
> >> defitinition from 4.5) and it's working !
> >> I'll open a ticket in Redhat tracker to try and get this fixed in 
> >> RHEL7.
> >>
> >> I have a dumb question though: how can we end up in numa/misplaced 
> >> memory code on a single socket system?
> >>
> > This patch is not a fix, do you see bug message in kernel log ? Because 
> > if
> > you do that it means we have a bigger issue.
> >
> > You did not answer one of my previous question, do you set 
> > get_user_pages
> > with write = 1 as a paremeter ?
> >
> > Also it would be a lot easier if you were testing with lastest 4.6 or 
> > 4.5
> > not RHEL kernel as they are far appart and what might looks like same 
> > issue
> > on both might be totaly different bugs.
> >
> > If you only really care about RHEL kernel then open a bug with Red Hat 
> > and
> > you can add me in bug-cc 
> >
> > Cheers,
> > Jérôme
>  I finally managed to get a proper setup.
>  I build a vanilla 4.5 kernel from git tree using the Centos7 config, my 
>  test fails as usual.
>  I applied your patch, rebuild => still fails and no new messages in 
>  dmesg.
> 
>  Now that I don't have to go through the RPM repackaging, I can try out 
>  things much quicker if you have any ideas.
> 
> >>> Still an issue if you boot with transparent_hugepage=never ?
> >>>
> >>> Also to simplify investigation force write to 1 all the time no matter 
> >>> what.
> >>>
> >>> Cheers,
> >>> Jérôme
> >> With transparent_hugepage=never I can't see the bug anymore.
> >>
> > Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
> > (does not apply to 3.10) and without transparent_hugepage=never
> >
> > Jérôme
> 
> Fails with 4.5 + this patch and with 4.5 + this patch + yours
> 

There must be some bug in your code, we have upstream user that works
fine with the above combination (see drivers/vfio/vfio_iommu_type1.c)
i suspect you might be releasing the page pin too early (put_page()).

If you really believe it is bug upstream we would need a dumb kernel
module that does gup like you do and that shows the issue. Right now
looking at code (assuming above patches applied) i can't see anything
that can go wrong with THP.

Cheers,
Jérôme


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Nicolas Morey-Chaisemartin


Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit :
> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote:
>>
>> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit :
>>> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote:
 Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin 
> wrote:
>> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
>>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
 On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
 [...]
>> Hi,
>>
>> I backported the patch to 3.10 (had to copy paste pmd_protnone 
>> defitinition from 4.5) and it's working !
>> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
>>
>> I have a dumb question though: how can we end up in numa/misplaced 
>> memory code on a single socket system?
>>
> This patch is not a fix, do you see bug message in kernel log ? Because if
> you do that it means we have a bigger issue.
>
> You did not answer one of my previous question, do you set get_user_pages
> with write = 1 as a paremeter ?
>
> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> not RHEL kernel as they are far appart and what might looks like same 
> issue
> on both might be totaly different bugs.
>
> If you only really care about RHEL kernel then open a bug with Red Hat and
> you can add me in bug-cc 
>
> Cheers,
> Jérôme
 I finally managed to get a proper setup.
 I build a vanilla 4.5 kernel from git tree using the Centos7 config, my 
 test fails as usual.
 I applied your patch, rebuild => still fails and no new messages in dmesg.

 Now that I don't have to go through the RPM repackaging, I can try out 
 things much quicker if you have any ideas.

>>> Still an issue if you boot with transparent_hugepage=never ?
>>>
>>> Also to simplify investigation force write to 1 all the time no matter what.
>>>
>>> Cheers,
>>> Jérôme
>> With transparent_hugepage=never I can't see the bug anymore.
>>
> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
> (does not apply to 3.10) and without transparent_hugepage=never
>
> Jérôme

Fails with 4.5 + this patch and with 4.5 + this patch + yours

Nicolas


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Nicolas Morey-Chaisemartin


Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit :
> On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote:
>>
>> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit :
>>> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote:
 Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin 
> wrote:
>> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
>>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
 On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
 [...]
>> Hi,
>>
>> I backported the patch to 3.10 (had to copy paste pmd_protnone 
>> defitinition from 4.5) and it's working !
>> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
>>
>> I have a dumb question though: how can we end up in numa/misplaced 
>> memory code on a single socket system?
>>
> This patch is not a fix, do you see bug message in kernel log ? Because if
> you do that it means we have a bigger issue.
>
> You did not answer one of my previous question, do you set get_user_pages
> with write = 1 as a paremeter ?
>
> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> not RHEL kernel as they are far appart and what might looks like same 
> issue
> on both might be totaly different bugs.
>
> If you only really care about RHEL kernel then open a bug with Red Hat and
> you can add me in bug-cc 
>
> Cheers,
> Jérôme
 I finally managed to get a proper setup.
 I build a vanilla 4.5 kernel from git tree using the Centos7 config, my 
 test fails as usual.
 I applied your patch, rebuild => still fails and no new messages in dmesg.

 Now that I don't have to go through the RPM repackaging, I can try out 
 things much quicker if you have any ideas.

>>> Still an issue if you boot with transparent_hugepage=never ?
>>>
>>> Also to simplify investigation force write to 1 all the time no matter what.
>>>
>>> Cheers,
>>> Jérôme
>> With transparent_hugepage=never I can't see the bug anymore.
>>
> Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
> (does not apply to 3.10) and without transparent_hugepage=never
>
> Jérôme

Fails with 4.5 + this patch and with 4.5 + this patch + yours

Nicolas


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Jerome Glisse
On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote:
> 
> 
> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit :
> > On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote:
> >>
> >> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> >>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin 
> >>> wrote:
>  Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
> >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> >> [...]
>  Hi,
> 
>  I backported the patch to 3.10 (had to copy paste pmd_protnone 
>  defitinition from 4.5) and it's working !
>  I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
> 
>  I have a dumb question though: how can we end up in numa/misplaced 
>  memory code on a single socket system?
> 
> >>> This patch is not a fix, do you see bug message in kernel log ? Because if
> >>> you do that it means we have a bigger issue.
> >>>
> >>> You did not answer one of my previous question, do you set get_user_pages
> >>> with write = 1 as a paremeter ?
> >>>
> >>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> >>> not RHEL kernel as they are far appart and what might looks like same 
> >>> issue
> >>> on both might be totaly different bugs.
> >>>
> >>> If you only really care about RHEL kernel then open a bug with Red Hat and
> >>> you can add me in bug-cc 
> >>>
> >>> Cheers,
> >>> Jérôme
> >> I finally managed to get a proper setup.
> >> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my 
> >> test fails as usual.
> >> I applied your patch, rebuild => still fails and no new messages in dmesg.
> >>
> >> Now that I don't have to go through the RPM repackaging, I can try out 
> >> things much quicker if you have any ideas.
> >>
> > Still an issue if you boot with transparent_hugepage=never ?
> >
> > Also to simplify investigation force write to 1 all the time no matter what.
> >
> > Cheers,
> > Jérôme
> 
> With transparent_hugepage=never I can't see the bug anymore.
> 

Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
(does not apply to 3.10) and without transparent_hugepage=never

Jérôme


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Jerome Glisse
On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote:
> 
> 
> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit :
> > On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote:
> >>
> >> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> >>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin 
> >>> wrote:
>  Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
> >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> >> [...]
>  Hi,
> 
>  I backported the patch to 3.10 (had to copy paste pmd_protnone 
>  defitinition from 4.5) and it's working !
>  I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
> 
>  I have a dumb question though: how can we end up in numa/misplaced 
>  memory code on a single socket system?
> 
> >>> This patch is not a fix, do you see bug message in kernel log ? Because if
> >>> you do that it means we have a bigger issue.
> >>>
> >>> You did not answer one of my previous question, do you set get_user_pages
> >>> with write = 1 as a paremeter ?
> >>>
> >>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> >>> not RHEL kernel as they are far appart and what might looks like same 
> >>> issue
> >>> on both might be totaly different bugs.
> >>>
> >>> If you only really care about RHEL kernel then open a bug with Red Hat and
> >>> you can add me in bug-cc 
> >>>
> >>> Cheers,
> >>> Jérôme
> >> I finally managed to get a proper setup.
> >> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my 
> >> test fails as usual.
> >> I applied your patch, rebuild => still fails and no new messages in dmesg.
> >>
> >> Now that I don't have to go through the RPM repackaging, I can try out 
> >> things much quicker if you have any ideas.
> >>
> > Still an issue if you boot with transparent_hugepage=never ?
> >
> > Also to simplify investigation force write to 1 all the time no matter what.
> >
> > Cheers,
> > Jérôme
> 
> With transparent_hugepage=never I can't see the bug anymore.
> 

Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
(does not apply to 3.10) and without transparent_hugepage=never

Jérôme


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Nicolas Morey-Chaisemartin


Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit :
> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote:
>>
>> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
>>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
 Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
>> [...]
 Hi,

 I backported the patch to 3.10 (had to copy paste pmd_protnone 
 defitinition from 4.5) and it's working !
 I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.

 I have a dumb question though: how can we end up in numa/misplaced memory 
 code on a single socket system?

>>> This patch is not a fix, do you see bug message in kernel log ? Because if
>>> you do that it means we have a bigger issue.
>>>
>>> You did not answer one of my previous question, do you set get_user_pages
>>> with write = 1 as a paremeter ?
>>>
>>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
>>> not RHEL kernel as they are far appart and what might looks like same issue
>>> on both might be totaly different bugs.
>>>
>>> If you only really care about RHEL kernel then open a bug with Red Hat and
>>> you can add me in bug-cc 
>>>
>>> Cheers,
>>> Jérôme
>> I finally managed to get a proper setup.
>> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test 
>> fails as usual.
>> I applied your patch, rebuild => still fails and no new messages in dmesg.
>>
>> Now that I don't have to go through the RPM repackaging, I can try out 
>> things much quicker if you have any ideas.
>>
> Still an issue if you boot with transparent_hugepage=never ?
>
> Also to simplify investigation force write to 1 all the time no matter what.
>
> Cheers,
> Jérôme

With transparent_hugepage=never I can't see the bug anymore.

Nicolas


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Nicolas Morey-Chaisemartin


Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit :
> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote:
>>
>> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
>>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
 Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
>> [...]
 Hi,

 I backported the patch to 3.10 (had to copy paste pmd_protnone 
 defitinition from 4.5) and it's working !
 I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.

 I have a dumb question though: how can we end up in numa/misplaced memory 
 code on a single socket system?

>>> This patch is not a fix, do you see bug message in kernel log ? Because if
>>> you do that it means we have a bigger issue.
>>>
>>> You did not answer one of my previous question, do you set get_user_pages
>>> with write = 1 as a paremeter ?
>>>
>>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
>>> not RHEL kernel as they are far appart and what might looks like same issue
>>> on both might be totaly different bugs.
>>>
>>> If you only really care about RHEL kernel then open a bug with Red Hat and
>>> you can add me in bug-cc 
>>>
>>> Cheers,
>>> Jérôme
>> I finally managed to get a proper setup.
>> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test 
>> fails as usual.
>> I applied your patch, rebuild => still fails and no new messages in dmesg.
>>
>> Now that I don't have to go through the RPM repackaging, I can try out 
>> things much quicker if you have any ideas.
>>
> Still an issue if you boot with transparent_hugepage=never ?
>
> Also to simplify investigation force write to 1 all the time no matter what.
>
> Cheers,
> Jérôme

With transparent_hugepage=never I can't see the bug anymore.

Nicolas


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-11 Thread Jerome Glisse
On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote:
> 
> 
> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
> >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>  On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> [...]
> >> Hi,
> >>
> >> I backported the patch to 3.10 (had to copy paste pmd_protnone 
> >> defitinition from 4.5) and it's working !
> >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
> >>
> >> I have a dumb question though: how can we end up in numa/misplaced memory 
> >> code on a single socket system?
> >>
> > This patch is not a fix, do you see bug message in kernel log ? Because if
> > you do that it means we have a bigger issue.
> >
> > You did not answer one of my previous question, do you set get_user_pages
> > with write = 1 as a paremeter ?
> >
> > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> > not RHEL kernel as they are far appart and what might looks like same issue
> > on both might be totaly different bugs.
> >
> > If you only really care about RHEL kernel then open a bug with Red Hat and
> > you can add me in bug-cc 
> >
> > Cheers,
> > Jérôme
> 
> I finally managed to get a proper setup.
> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test 
> fails as usual.
> I applied your patch, rebuild => still fails and no new messages in dmesg.
> 
> Now that I don't have to go through the RPM repackaging, I can try out things 
> much quicker if you have any ideas.
> 

Still an issue if you boot with transparent_hugepage=never ?

Also to simplify investigation force write to 1 all the time no matter what.

Cheers,
Jérôme


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-11 Thread Jerome Glisse
On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote:
> 
> 
> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
> >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>  On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> [...]
> >> Hi,
> >>
> >> I backported the patch to 3.10 (had to copy paste pmd_protnone 
> >> defitinition from 4.5) and it's working !
> >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
> >>
> >> I have a dumb question though: how can we end up in numa/misplaced memory 
> >> code on a single socket system?
> >>
> > This patch is not a fix, do you see bug message in kernel log ? Because if
> > you do that it means we have a bigger issue.
> >
> > You did not answer one of my previous question, do you set get_user_pages
> > with write = 1 as a paremeter ?
> >
> > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> > not RHEL kernel as they are far appart and what might looks like same issue
> > on both might be totaly different bugs.
> >
> > If you only really care about RHEL kernel then open a bug with Red Hat and
> > you can add me in bug-cc 
> >
> > Cheers,
> > Jérôme
> 
> I finally managed to get a proper setup.
> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test 
> fails as usual.
> I applied your patch, rebuild => still fails and no new messages in dmesg.
> 
> Now that I don't have to go through the RPM repackaging, I can try out things 
> much quicker if you have any ideas.
> 

Still an issue if you boot with transparent_hugepage=never ?

Also to simplify investigation force write to 1 all the time no matter what.

Cheers,
Jérôme


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-11 Thread Nicolas Morey Chaisemartin


Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
>> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
>>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
 On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
[...]
>> Hi,
>>
>> I backported the patch to 3.10 (had to copy paste pmd_protnone defitinition 
>> from 4.5) and it's working !
>> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
>>
>> I have a dumb question though: how can we end up in numa/misplaced memory 
>> code on a single socket system?
>>
> This patch is not a fix, do you see bug message in kernel log ? Because if
> you do that it means we have a bigger issue.
>
> You did not answer one of my previous question, do you set get_user_pages
> with write = 1 as a paremeter ?
>
> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> not RHEL kernel as they are far appart and what might looks like same issue
> on both might be totaly different bugs.
>
> If you only really care about RHEL kernel then open a bug with Red Hat and
> you can add me in bug-cc 
>
> Cheers,
> Jérôme

I finally managed to get a proper setup.
I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test 
fails as usual.
I applied your patch, rebuild => still fails and no new messages in dmesg.

Now that I don't have to go through the RPM repackaging, I can try out things 
much quicker if you have any ideas.

Nicolas


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-11 Thread Nicolas Morey Chaisemartin


Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
>> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
>>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
 On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
[...]
>> Hi,
>>
>> I backported the patch to 3.10 (had to copy paste pmd_protnone defitinition 
>> from 4.5) and it's working !
>> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
>>
>> I have a dumb question though: how can we end up in numa/misplaced memory 
>> code on a single socket system?
>>
> This patch is not a fix, do you see bug message in kernel log ? Because if
> you do that it means we have a bigger issue.
>
> You did not answer one of my previous question, do you set get_user_pages
> with write = 1 as a paremeter ?
>
> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> not RHEL kernel as they are far appart and what might looks like same issue
> on both might be totaly different bugs.
>
> If you only really care about RHEL kernel then open a bug with Red Hat and
> you can add me in bug-cc 
>
> Cheers,
> Jérôme

I finally managed to get a proper setup.
I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test 
fails as usual.
I applied your patch, rebuild => still fails and no new messages in dmesg.

Now that I don't have to go through the RPM repackaging, I can try out things 
much quicker if you have any ideas.

Nicolas


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-11 Thread Nicolas Morey Chaisemartin


Le 05/10/2016 à 03:34 PM, Jerome Glisse a écrit :
> On Tue, May 10, 2016 at 01:15:02PM +0200, Nicolas Morey Chaisemartin wrote:
>> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
>>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
 Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> [...]
>
 Hi,

 I backported the patch to 3.10 (had to copy paste pmd_protnone 
 defitinition from 4.5) and it's working !
 I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.

 I have a dumb question though: how can we end up in numa/misplaced memory 
 code on a single socket system?

>>> This patch is not a fix, do you see bug message in kernel log ? Because if
>>> you do that it means we have a bigger issue.
>> I don't see any on my 3.10. I have DMA_API_DEBUG enabled but I don't think 
>> it has an impact.
> My patch can't be backported to 3.10 as is, you most likely need to replace
> pmd_protnone() by pmd_numa()
>
>>> You did not answer one of my previous question, do you set get_user_pages
>>> with write = 1 as a paremeter ?
>> For the read from the device, yes:
>> down_read(>mm->mmap_sem);
>> res = get_user_pages(
>> current,
>> current->mm,
>> (unsigned long) iov->host_addr,
>> page_count,
>> (write_mode == 0) ? 1 : 0,  /* write */
>> 0,  /* force */
>> >pages[sg_o],
>> NULL);
>> up_read(>mm->mmap_sem);
> As i don't have context to infer how write_mode is set above, do you mind
> retesting your driver and always asking for write no matter what ?
write_mode is 0 for car2host transfers so yes, write_mode is 1.
During debug I tried with write_mode=1 and force=1 in all cases and it failed 
too.
>>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
>>> not RHEL kernel as they are far appart and what might looks like same issue
>>> on both might be totaly different bugs.
>> Is a RPM from elrepo ok?
>> http://elrepo.org/linux/kernel/el7/SRPMS/
> Yes should be ok for testing.
>
I tried the elrpo 4.5.2 package without your patch and my test fails, sadly the 
src rpm from elrepo does not contaisn the kernel sources and I haven't looked 
how to get the proper tarball.
I tried to rebuild a src rpm for a fedora 24 (kernel 4.5.3) and it works 
without your patch. I'm not sure what differs in their config. I'll keep 
digging.

Nicolas


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-11 Thread Nicolas Morey Chaisemartin


Le 05/10/2016 à 03:34 PM, Jerome Glisse a écrit :
> On Tue, May 10, 2016 at 01:15:02PM +0200, Nicolas Morey Chaisemartin wrote:
>> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
>>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
 Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> [...]
>
 Hi,

 I backported the patch to 3.10 (had to copy paste pmd_protnone 
 defitinition from 4.5) and it's working !
 I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.

 I have a dumb question though: how can we end up in numa/misplaced memory 
 code on a single socket system?

>>> This patch is not a fix, do you see bug message in kernel log ? Because if
>>> you do that it means we have a bigger issue.
>> I don't see any on my 3.10. I have DMA_API_DEBUG enabled but I don't think 
>> it has an impact.
> My patch can't be backported to 3.10 as is, you most likely need to replace
> pmd_protnone() by pmd_numa()
>
>>> You did not answer one of my previous question, do you set get_user_pages
>>> with write = 1 as a paremeter ?
>> For the read from the device, yes:
>> down_read(>mm->mmap_sem);
>> res = get_user_pages(
>> current,
>> current->mm,
>> (unsigned long) iov->host_addr,
>> page_count,
>> (write_mode == 0) ? 1 : 0,  /* write */
>> 0,  /* force */
>> >pages[sg_o],
>> NULL);
>> up_read(>mm->mmap_sem);
> As i don't have context to infer how write_mode is set above, do you mind
> retesting your driver and always asking for write no matter what ?
write_mode is 0 for car2host transfers so yes, write_mode is 1.
During debug I tried with write_mode=1 and force=1 in all cases and it failed 
too.
>>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
>>> not RHEL kernel as they are far appart and what might looks like same issue
>>> on both might be totaly different bugs.
>> Is a RPM from elrepo ok?
>> http://elrepo.org/linux/kernel/el7/SRPMS/
> Yes should be ok for testing.
>
I tried the elrpo 4.5.2 package without your patch and my test fails, sadly the 
src rpm from elrepo does not contaisn the kernel sources and I haven't looked 
how to get the proper tarball.
I tried to rebuild a src rpm for a fedora 24 (kernel 4.5.3) and it works 
without your patch. I'm not sure what differs in their config. I'll keep 
digging.

Nicolas


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-10 Thread Jerome Glisse
On Tue, May 10, 2016 at 01:15:02PM +0200, Nicolas Morey Chaisemartin wrote:
> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
> >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>  On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:

[...]

> >> Hi,
> >>
> >> I backported the patch to 3.10 (had to copy paste pmd_protnone 
> >> defitinition from 4.5) and it's working !
> >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
> >>
> >> I have a dumb question though: how can we end up in numa/misplaced memory 
> >> code on a single socket system?
> >>
> > This patch is not a fix, do you see bug message in kernel log ? Because if
> > you do that it means we have a bigger issue.
> I don't see any on my 3.10. I have DMA_API_DEBUG enabled but I don't think it 
> has an impact.

My patch can't be backported to 3.10 as is, you most likely need to replace
pmd_protnone() by pmd_numa()

> > You did not answer one of my previous question, do you set get_user_pages
> > with write = 1 as a paremeter ?
> For the read from the device, yes:
> down_read(>mm->mmap_sem);
> res = get_user_pages(
> current,
> current->mm,
> (unsigned long) iov->host_addr,
> page_count,
> (write_mode == 0) ? 1 : 0,  /* write */
> 0,  /* force */
> >pages[sg_o],
> NULL);
> up_read(>mm->mmap_sem);

As i don't have context to infer how write_mode is set above, do you mind
retesting your driver and always asking for write no matter what ?

> > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> > not RHEL kernel as they are far appart and what might looks like same issue
> > on both might be totaly different bugs.
> Is a RPM from elrepo ok?
> http://elrepo.org/linux/kernel/el7/SRPMS/

Yes should be ok for testing.

Cheers,
Jérôme


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-10 Thread Jerome Glisse
On Tue, May 10, 2016 at 01:15:02PM +0200, Nicolas Morey Chaisemartin wrote:
> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> > On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
> >> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> >>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
>  On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:

[...]

> >> Hi,
> >>
> >> I backported the patch to 3.10 (had to copy paste pmd_protnone 
> >> defitinition from 4.5) and it's working !
> >> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
> >>
> >> I have a dumb question though: how can we end up in numa/misplaced memory 
> >> code on a single socket system?
> >>
> > This patch is not a fix, do you see bug message in kernel log ? Because if
> > you do that it means we have a bigger issue.
> I don't see any on my 3.10. I have DMA_API_DEBUG enabled but I don't think it 
> has an impact.

My patch can't be backported to 3.10 as is, you most likely need to replace
pmd_protnone() by pmd_numa()

> > You did not answer one of my previous question, do you set get_user_pages
> > with write = 1 as a paremeter ?
> For the read from the device, yes:
> down_read(>mm->mmap_sem);
> res = get_user_pages(
> current,
> current->mm,
> (unsigned long) iov->host_addr,
> page_count,
> (write_mode == 0) ? 1 : 0,  /* write */
> 0,  /* force */
> >pages[sg_o],
> NULL);
> up_read(>mm->mmap_sem);

As i don't have context to infer how write_mode is set above, do you mind
retesting your driver and always asking for write no matter what ?

> > Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> > not RHEL kernel as they are far appart and what might looks like same issue
> > on both might be totaly different bugs.
> Is a RPM from elrepo ok?
> http://elrepo.org/linux/kernel/el7/SRPMS/

Yes should be ok for testing.

Cheers,
Jérôme


Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-10 Thread Nicolas Morey Chaisemartin


Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
>> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
>>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
 On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:

> Hi everyone,
>
> This is a repost from a different address as it seems the previous one 
> ended in Gmail junk due to a domain error..
 linux-kernel is a very high volume list which few are reading:
 that also will account for your lack of response so far
 (apart from the indefatigable Alan).

 I've added linux-mm, and some people from another thread regarding
 THP and get_user_pages() pins which has been discussed in recent days.

 Make no mistake, the issue you're raising here is definitely not the
 same as that one (which is specifically about the new THP refcounting
 in v4.5+, whereas you're reporting a problem you've seen in both a
 v3.10-based kernel and in v4.5).  But I think their heads are in
 gear, much more so than mine, and likely to spot something.

> I added more info found while blindly debugging the issue.
>
> Short version:
> I'm having an issue with direct DMA transfer from a device to host memory.
> It seems some of the data is not transferring to the appropriate page.
>
> Some more details:
> I'm debugging a home made PCI driver for our board (Kalray), attached to 
> a x86_64 host running centos7 (3.10.0-327.el7.x86_64)
>
> In the current case, a userland application transfers back and forth data 
> through read/write operations on a file.
> On the kernel side, it triggers DMA transfers through the PCI to/from our 
> board memory.
>
> We followed what pretty much all docs said about direct I/O to user 
> buffers:
>
> 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> 2) convert to a scatterlist
> 3) pci_map_sg
> 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually 
> possible)
> 4) A lot of DMA engine handling code, using the dmaengine layer and 
> virt-dma
> 5) wait for transfer complete, in the mean time, go back to (1) to 
> schedule more work, if any
> 6) pci_unmap_sg
> 7) for read (card2host) transfer, set_page_dirty_lock
> 8) page_cache_release
>
> In 99,% it works perfectly.
> However, I have one userland application where a few pages are not 
> written by a read (card2host) transfer.
> The buffer is memset them to a different value so I can check that 
> nothing has overwritten them.
>
> I know (PCI protocol analyser) that the data left our board for the 
> "right" address (the one set in the sg by pci_map_sg).
> I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> using
> uint32_t *addr = page_address(trans->pages[0]);
> dev_warn(>pdev->dev, "val = %x\n", *addr);
> and it has the expected value.
> But if I try to copy_from_user (using the address coming from userland, 
> the one passed to get_user_pages), the data has not been written and I 
> see the memset value.
>
> New infos:
>
> The issue happens with IOMMU on or off.
> I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or 
> errors.
>
> I digged a little bit deeper with my very small understanding of linux mm 
> and I discovered that:
>  * we are using transparent huge pages
>  * the page 'not transferred' are the last few of a huge page
> More precisely:
> - We have several transfer in flight from the same user buffer
> - Each transfer is 16 pages long
> - At one point in time, we start transferring from another huge page 
> (transfers are still in flight from the previous one)
> - When a transfer from the previous huge page completes, I dumped at the 
> mapcount of the pages from the previous transfers,
>   they are all to 0. The pages are still mapped to dma at this point.
> - A get_user_page to the address of the completed transfer returns return 
> a different struct page * then the on I had.
> But this is before I have unmapped/put_page them back. From my 
> understanding this should not have happened.
>
> I tried the same code with a kernel 4.5 and encountered the same issue
>
> Disabling transparent huge pages makes the issue disapear
>
> Thanks in advance
 It does look to me as if pages are being migrated, despite being pinned
 by get_user_pages(): and that would be wrong.  Originally I intended
 to suggest that THP is probably merely the cause of compaction, with
 compaction causing the page migration.  But you posted very interesting
 details in an earlier mail on 27th April from :

> I ran 

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-10 Thread Nicolas Morey Chaisemartin


Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
>> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
>>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
 On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:

> Hi everyone,
>
> This is a repost from a different address as it seems the previous one 
> ended in Gmail junk due to a domain error..
 linux-kernel is a very high volume list which few are reading:
 that also will account for your lack of response so far
 (apart from the indefatigable Alan).

 I've added linux-mm, and some people from another thread regarding
 THP and get_user_pages() pins which has been discussed in recent days.

 Make no mistake, the issue you're raising here is definitely not the
 same as that one (which is specifically about the new THP refcounting
 in v4.5+, whereas you're reporting a problem you've seen in both a
 v3.10-based kernel and in v4.5).  But I think their heads are in
 gear, much more so than mine, and likely to spot something.

> I added more info found while blindly debugging the issue.
>
> Short version:
> I'm having an issue with direct DMA transfer from a device to host memory.
> It seems some of the data is not transferring to the appropriate page.
>
> Some more details:
> I'm debugging a home made PCI driver for our board (Kalray), attached to 
> a x86_64 host running centos7 (3.10.0-327.el7.x86_64)
>
> In the current case, a userland application transfers back and forth data 
> through read/write operations on a file.
> On the kernel side, it triggers DMA transfers through the PCI to/from our 
> board memory.
>
> We followed what pretty much all docs said about direct I/O to user 
> buffers:
>
> 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> 2) convert to a scatterlist
> 3) pci_map_sg
> 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually 
> possible)
> 4) A lot of DMA engine handling code, using the dmaengine layer and 
> virt-dma
> 5) wait for transfer complete, in the mean time, go back to (1) to 
> schedule more work, if any
> 6) pci_unmap_sg
> 7) for read (card2host) transfer, set_page_dirty_lock
> 8) page_cache_release
>
> In 99,% it works perfectly.
> However, I have one userland application where a few pages are not 
> written by a read (card2host) transfer.
> The buffer is memset them to a different value so I can check that 
> nothing has overwritten them.
>
> I know (PCI protocol analyser) that the data left our board for the 
> "right" address (the one set in the sg by pci_map_sg).
> I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> using
> uint32_t *addr = page_address(trans->pages[0]);
> dev_warn(>pdev->dev, "val = %x\n", *addr);
> and it has the expected value.
> But if I try to copy_from_user (using the address coming from userland, 
> the one passed to get_user_pages), the data has not been written and I 
> see the memset value.
>
> New infos:
>
> The issue happens with IOMMU on or off.
> I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or 
> errors.
>
> I digged a little bit deeper with my very small understanding of linux mm 
> and I discovered that:
>  * we are using transparent huge pages
>  * the page 'not transferred' are the last few of a huge page
> More precisely:
> - We have several transfer in flight from the same user buffer
> - Each transfer is 16 pages long
> - At one point in time, we start transferring from another huge page 
> (transfers are still in flight from the previous one)
> - When a transfer from the previous huge page completes, I dumped at the 
> mapcount of the pages from the previous transfers,
>   they are all to 0. The pages are still mapped to dma at this point.
> - A get_user_page to the address of the completed transfer returns return 
> a different struct page * then the on I had.
> But this is before I have unmapped/put_page them back. From my 
> understanding this should not have happened.
>
> I tried the same code with a kernel 4.5 and encountered the same issue
>
> Disabling transparent huge pages makes the issue disapear
>
> Thanks in advance
 It does look to me as if pages are being migrated, despite being pinned
 by get_user_pages(): and that would be wrong.  Originally I intended
 to suggest that THP is probably merely the cause of compaction, with
 compaction causing the page migration.  But you posted very interesting
 details in an earlier mail on 27th April from :

> I ran some more tests:

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-10 Thread Jerome Glisse
On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
> >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> This is a repost from a different address as it seems the previous one 
> >>> ended in Gmail junk due to a domain error..
> >> linux-kernel is a very high volume list which few are reading:
> >> that also will account for your lack of response so far
> >> (apart from the indefatigable Alan).
> >>
> >> I've added linux-mm, and some people from another thread regarding
> >> THP and get_user_pages() pins which has been discussed in recent days.
> >>
> >> Make no mistake, the issue you're raising here is definitely not the
> >> same as that one (which is specifically about the new THP refcounting
> >> in v4.5+, whereas you're reporting a problem you've seen in both a
> >> v3.10-based kernel and in v4.5).  But I think their heads are in
> >> gear, much more so than mine, and likely to spot something.
> >>
> >>> I added more info found while blindly debugging the issue.
> >>>
> >>> Short version:
> >>> I'm having an issue with direct DMA transfer from a device to host memory.
> >>> It seems some of the data is not transferring to the appropriate page.
> >>>
> >>> Some more details:
> >>> I'm debugging a home made PCI driver for our board (Kalray), attached to 
> >>> a x86_64 host running centos7 (3.10.0-327.el7.x86_64)
> >>>
> >>> In the current case, a userland application transfers back and forth data 
> >>> through read/write operations on a file.
> >>> On the kernel side, it triggers DMA transfers through the PCI to/from our 
> >>> board memory.
> >>>
> >>> We followed what pretty much all docs said about direct I/O to user 
> >>> buffers:
> >>>
> >>> 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> >>> 2) convert to a scatterlist
> >>> 3) pci_map_sg
> >>> 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually 
> >>> possible)
> >>> 4) A lot of DMA engine handling code, using the dmaengine layer and 
> >>> virt-dma
> >>> 5) wait for transfer complete, in the mean time, go back to (1) to 
> >>> schedule more work, if any
> >>> 6) pci_unmap_sg
> >>> 7) for read (card2host) transfer, set_page_dirty_lock
> >>> 8) page_cache_release
> >>>
> >>> In 99,% it works perfectly.
> >>> However, I have one userland application where a few pages are not 
> >>> written by a read (card2host) transfer.
> >>> The buffer is memset them to a different value so I can check that 
> >>> nothing has overwritten them.
> >>>
> >>> I know (PCI protocol analyser) that the data left our board for the 
> >>> "right" address (the one set in the sg by pci_map_sg).
> >>> I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> >>> using
> >>> uint32_t *addr = page_address(trans->pages[0]);
> >>> dev_warn(>pdev->dev, "val = %x\n", *addr);
> >>> and it has the expected value.
> >>> But if I try to copy_from_user (using the address coming from userland, 
> >>> the one passed to get_user_pages), the data has not been written and I 
> >>> see the memset value.
> >>>
> >>> New infos:
> >>>
> >>> The issue happens with IOMMU on or off.
> >>> I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or 
> >>> errors.
> >>>
> >>> I digged a little bit deeper with my very small understanding of linux mm 
> >>> and I discovered that:
> >>>  * we are using transparent huge pages
> >>>  * the page 'not transferred' are the last few of a huge page
> >>> More precisely:
> >>> - We have several transfer in flight from the same user buffer
> >>> - Each transfer is 16 pages long
> >>> - At one point in time, we start transferring from another huge page 
> >>> (transfers are still in flight from the previous one)
> >>> - When a transfer from the previous huge page completes, I dumped at the 
> >>> mapcount of the pages from the previous transfers,
> >>>   they are all to 0. The pages are still mapped to dma at this point.
> >>> - A get_user_page to the address of the completed transfer returns return 
> >>> a different struct page * then the on I had.
> >>> But this is before I have unmapped/put_page them back. From my 
> >>> understanding this should not have happened.
> >>>
> >>> I tried the same code with a kernel 4.5 and encountered the same issue
> >>>
> >>> Disabling transparent huge pages makes the issue disapear
> >>>
> >>> Thanks in advance
> >> It does look to me as if pages are being migrated, despite being pinned
> >> by get_user_pages(): and that would be wrong.  Originally I intended
> >> to suggest that THP is probably merely the cause of compaction, with
> >> compaction causing the page migration.  But you posted very interesting
> >> details in an earlier mail on 27th April from :
> >>
> >>> I ran some more tests:
> >>>
> >>> * Test is OK if transparent 

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-10 Thread Jerome Glisse
On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> > On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
> >> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> This is a repost from a different address as it seems the previous one 
> >>> ended in Gmail junk due to a domain error..
> >> linux-kernel is a very high volume list which few are reading:
> >> that also will account for your lack of response so far
> >> (apart from the indefatigable Alan).
> >>
> >> I've added linux-mm, and some people from another thread regarding
> >> THP and get_user_pages() pins which has been discussed in recent days.
> >>
> >> Make no mistake, the issue you're raising here is definitely not the
> >> same as that one (which is specifically about the new THP refcounting
> >> in v4.5+, whereas you're reporting a problem you've seen in both a
> >> v3.10-based kernel and in v4.5).  But I think their heads are in
> >> gear, much more so than mine, and likely to spot something.
> >>
> >>> I added more info found while blindly debugging the issue.
> >>>
> >>> Short version:
> >>> I'm having an issue with direct DMA transfer from a device to host memory.
> >>> It seems some of the data is not transferring to the appropriate page.
> >>>
> >>> Some more details:
> >>> I'm debugging a home made PCI driver for our board (Kalray), attached to 
> >>> a x86_64 host running centos7 (3.10.0-327.el7.x86_64)
> >>>
> >>> In the current case, a userland application transfers back and forth data 
> >>> through read/write operations on a file.
> >>> On the kernel side, it triggers DMA transfers through the PCI to/from our 
> >>> board memory.
> >>>
> >>> We followed what pretty much all docs said about direct I/O to user 
> >>> buffers:
> >>>
> >>> 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> >>> 2) convert to a scatterlist
> >>> 3) pci_map_sg
> >>> 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually 
> >>> possible)
> >>> 4) A lot of DMA engine handling code, using the dmaengine layer and 
> >>> virt-dma
> >>> 5) wait for transfer complete, in the mean time, go back to (1) to 
> >>> schedule more work, if any
> >>> 6) pci_unmap_sg
> >>> 7) for read (card2host) transfer, set_page_dirty_lock
> >>> 8) page_cache_release
> >>>
> >>> In 99,% it works perfectly.
> >>> However, I have one userland application where a few pages are not 
> >>> written by a read (card2host) transfer.
> >>> The buffer is memset them to a different value so I can check that 
> >>> nothing has overwritten them.
> >>>
> >>> I know (PCI protocol analyser) that the data left our board for the 
> >>> "right" address (the one set in the sg by pci_map_sg).
> >>> I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> >>> using
> >>> uint32_t *addr = page_address(trans->pages[0]);
> >>> dev_warn(>pdev->dev, "val = %x\n", *addr);
> >>> and it has the expected value.
> >>> But if I try to copy_from_user (using the address coming from userland, 
> >>> the one passed to get_user_pages), the data has not been written and I 
> >>> see the memset value.
> >>>
> >>> New infos:
> >>>
> >>> The issue happens with IOMMU on or off.
> >>> I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or 
> >>> errors.
> >>>
> >>> I digged a little bit deeper with my very small understanding of linux mm 
> >>> and I discovered that:
> >>>  * we are using transparent huge pages
> >>>  * the page 'not transferred' are the last few of a huge page
> >>> More precisely:
> >>> - We have several transfer in flight from the same user buffer
> >>> - Each transfer is 16 pages long
> >>> - At one point in time, we start transferring from another huge page 
> >>> (transfers are still in flight from the previous one)
> >>> - When a transfer from the previous huge page completes, I dumped at the 
> >>> mapcount of the pages from the previous transfers,
> >>>   they are all to 0. The pages are still mapped to dma at this point.
> >>> - A get_user_page to the address of the completed transfer returns return 
> >>> a different struct page * then the on I had.
> >>> But this is before I have unmapped/put_page them back. From my 
> >>> understanding this should not have happened.
> >>>
> >>> I tried the same code with a kernel 4.5 and encountered the same issue
> >>>
> >>> Disabling transparent huge pages makes the issue disapear
> >>>
> >>> Thanks in advance
> >> It does look to me as if pages are being migrated, despite being pinned
> >> by get_user_pages(): and that would be wrong.  Originally I intended
> >> to suggest that THP is probably merely the cause of compaction, with
> >> compaction causing the page migration.  But you posted very interesting
> >> details in an earlier mail on 27th April from :
> >>
> >>> I ran some more tests:
> >>>
> >>> * Test is OK if transparent huge tlb are 

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-03 Thread Kirill A. Shutemov
On Tue, May 03, 2016 at 12:11:54PM +0200, Jerome Glisse wrote:
> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
> > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> > 
> > > Hi everyone,
> > > 
> > > This is a repost from a different address as it seems the previous one 
> > > ended in Gmail junk due to a domain error..
> > 
> > linux-kernel is a very high volume list which few are reading:
> > that also will account for your lack of response so far
> > (apart from the indefatigable Alan).
> > 
> > I've added linux-mm, and some people from another thread regarding
> > THP and get_user_pages() pins which has been discussed in recent days.
> > 
> > Make no mistake, the issue you're raising here is definitely not the
> > same as that one (which is specifically about the new THP refcounting
> > in v4.5+, whereas you're reporting a problem you've seen in both a
> > v3.10-based kernel and in v4.5).  But I think their heads are in
> > gear, much more so than mine, and likely to spot something.
> > 
> > > I added more info found while blindly debugging the issue.
> > > 
> > > Short version:
> > > I'm having an issue with direct DMA transfer from a device to host memory.
> > > It seems some of the data is not transferring to the appropriate page.
> > > 
> > > Some more details:
> > > I'm debugging a home made PCI driver for our board (Kalray), attached to 
> > > a x86_64 host running centos7 (3.10.0-327.el7.x86_64)
> > > 
> > > In the current case, a userland application transfers back and forth data 
> > > through read/write operations on a file.
> > > On the kernel side, it triggers DMA transfers through the PCI to/from our 
> > > board memory.
> > > 
> > > We followed what pretty much all docs said about direct I/O to user 
> > > buffers:
> > > 
> > > 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> > > 2) convert to a scatterlist
> > > 3) pci_map_sg
> > > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually 
> > > possible)
> > > 4) A lot of DMA engine handling code, using the dmaengine layer and 
> > > virt-dma
> > > 5) wait for transfer complete, in the mean time, go back to (1) to 
> > > schedule more work, if any
> > > 6) pci_unmap_sg
> > > 7) for read (card2host) transfer, set_page_dirty_lock
> > > 8) page_cache_release
> > > 
> > > In 99,% it works perfectly.
> > > However, I have one userland application where a few pages are not 
> > > written by a read (card2host) transfer.
> > > The buffer is memset them to a different value so I can check that 
> > > nothing has overwritten them.
> > > 
> > > I know (PCI protocol analyser) that the data left our board for the 
> > > "right" address (the one set in the sg by pci_map_sg).
> > > I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> > > using
> > > uint32_t *addr = page_address(trans->pages[0]);
> > > dev_warn(>pdev->dev, "val = %x\n", *addr);
> > > and it has the expected value.
> > > But if I try to copy_from_user (using the address coming from userland, 
> > > the one passed to get_user_pages), the data has not been written and I 
> > > see the memset value.
> > > 
> > > New infos:
> > > 
> > > The issue happens with IOMMU on or off.
> > > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or 
> > > errors.
> > > 
> > > I digged a little bit deeper with my very small understanding of linux mm 
> > > and I discovered that:
> > >  * we are using transparent huge pages
> > >  * the page 'not transferred' are the last few of a huge page
> > > More precisely:
> > > - We have several transfer in flight from the same user buffer
> > > - Each transfer is 16 pages long
> > > - At one point in time, we start transferring from another huge page 
> > > (transfers are still in flight from the previous one)
> > > - When a transfer from the previous huge page completes, I dumped at the 
> > > mapcount of the pages from the previous transfers,
> > >   they are all to 0. The pages are still mapped to dma at this point.
> > > - A get_user_page to the address of the completed transfer returns return 
> > > a different struct page * then the on I had.
> > > But this is before I have unmapped/put_page them back. From my 
> > > understanding this should not have happened.
> > > 
> > > I tried the same code with a kernel 4.5 and encountered the same issue
> > > 
> > > Disabling transparent huge pages makes the issue disapear
> > > 
> > > Thanks in advance
> > 
> > It does look to me as if pages are being migrated, despite being pinned
> > by get_user_pages(): and that would be wrong.  Originally I intended
> > to suggest that THP is probably merely the cause of compaction, with
> > compaction causing the page migration.  But you posted very interesting
> > details in an earlier mail on 27th April from :
> > 
> > > I ran some more tests:
> > > 
> > > * Test is OK if transparent huge tlb are disabled
> > > 
> > > * For all the page where 

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-03 Thread Kirill A. Shutemov
On Tue, May 03, 2016 at 12:11:54PM +0200, Jerome Glisse wrote:
> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
> > On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> > 
> > > Hi everyone,
> > > 
> > > This is a repost from a different address as it seems the previous one 
> > > ended in Gmail junk due to a domain error..
> > 
> > linux-kernel is a very high volume list which few are reading:
> > that also will account for your lack of response so far
> > (apart from the indefatigable Alan).
> > 
> > I've added linux-mm, and some people from another thread regarding
> > THP and get_user_pages() pins which has been discussed in recent days.
> > 
> > Make no mistake, the issue you're raising here is definitely not the
> > same as that one (which is specifically about the new THP refcounting
> > in v4.5+, whereas you're reporting a problem you've seen in both a
> > v3.10-based kernel and in v4.5).  But I think their heads are in
> > gear, much more so than mine, and likely to spot something.
> > 
> > > I added more info found while blindly debugging the issue.
> > > 
> > > Short version:
> > > I'm having an issue with direct DMA transfer from a device to host memory.
> > > It seems some of the data is not transferring to the appropriate page.
> > > 
> > > Some more details:
> > > I'm debugging a home made PCI driver for our board (Kalray), attached to 
> > > a x86_64 host running centos7 (3.10.0-327.el7.x86_64)
> > > 
> > > In the current case, a userland application transfers back and forth data 
> > > through read/write operations on a file.
> > > On the kernel side, it triggers DMA transfers through the PCI to/from our 
> > > board memory.
> > > 
> > > We followed what pretty much all docs said about direct I/O to user 
> > > buffers:
> > > 
> > > 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> > > 2) convert to a scatterlist
> > > 3) pci_map_sg
> > > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually 
> > > possible)
> > > 4) A lot of DMA engine handling code, using the dmaengine layer and 
> > > virt-dma
> > > 5) wait for transfer complete, in the mean time, go back to (1) to 
> > > schedule more work, if any
> > > 6) pci_unmap_sg
> > > 7) for read (card2host) transfer, set_page_dirty_lock
> > > 8) page_cache_release
> > > 
> > > In 99,% it works perfectly.
> > > However, I have one userland application where a few pages are not 
> > > written by a read (card2host) transfer.
> > > The buffer is memset them to a different value so I can check that 
> > > nothing has overwritten them.
> > > 
> > > I know (PCI protocol analyser) that the data left our board for the 
> > > "right" address (the one set in the sg by pci_map_sg).
> > > I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> > > using
> > > uint32_t *addr = page_address(trans->pages[0]);
> > > dev_warn(>pdev->dev, "val = %x\n", *addr);
> > > and it has the expected value.
> > > But if I try to copy_from_user (using the address coming from userland, 
> > > the one passed to get_user_pages), the data has not been written and I 
> > > see the memset value.
> > > 
> > > New infos:
> > > 
> > > The issue happens with IOMMU on or off.
> > > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or 
> > > errors.
> > > 
> > > I digged a little bit deeper with my very small understanding of linux mm 
> > > and I discovered that:
> > >  * we are using transparent huge pages
> > >  * the page 'not transferred' are the last few of a huge page
> > > More precisely:
> > > - We have several transfer in flight from the same user buffer
> > > - Each transfer is 16 pages long
> > > - At one point in time, we start transferring from another huge page 
> > > (transfers are still in flight from the previous one)
> > > - When a transfer from the previous huge page completes, I dumped at the 
> > > mapcount of the pages from the previous transfers,
> > >   they are all to 0. The pages are still mapped to dma at this point.
> > > - A get_user_page to the address of the completed transfer returns return 
> > > a different struct page * then the on I had.
> > > But this is before I have unmapped/put_page them back. From my 
> > > understanding this should not have happened.
> > > 
> > > I tried the same code with a kernel 4.5 and encountered the same issue
> > > 
> > > Disabling transparent huge pages makes the issue disapear
> > > 
> > > Thanks in advance
> > 
> > It does look to me as if pages are being migrated, despite being pinned
> > by get_user_pages(): and that would be wrong.  Originally I intended
> > to suggest that THP is probably merely the cause of compaction, with
> > compaction causing the page migration.  But you posted very interesting
> > details in an earlier mail on 27th April from :
> > 
> > > I ran some more tests:
> > > 
> > > * Test is OK if transparent huge tlb are disabled
> > > 
> > > * For all the page where data are not 

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-03 Thread Jerome Glisse
On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> 
> > Hi everyone,
> > 
> > This is a repost from a different address as it seems the previous one 
> > ended in Gmail junk due to a domain error..
> 
> linux-kernel is a very high volume list which few are reading:
> that also will account for your lack of response so far
> (apart from the indefatigable Alan).
> 
> I've added linux-mm, and some people from another thread regarding
> THP and get_user_pages() pins which has been discussed in recent days.
> 
> Make no mistake, the issue you're raising here is definitely not the
> same as that one (which is specifically about the new THP refcounting
> in v4.5+, whereas you're reporting a problem you've seen in both a
> v3.10-based kernel and in v4.5).  But I think their heads are in
> gear, much more so than mine, and likely to spot something.
> 
> > I added more info found while blindly debugging the issue.
> > 
> > Short version:
> > I'm having an issue with direct DMA transfer from a device to host memory.
> > It seems some of the data is not transferring to the appropriate page.
> > 
> > Some more details:
> > I'm debugging a home made PCI driver for our board (Kalray), attached to a 
> > x86_64 host running centos7 (3.10.0-327.el7.x86_64)
> > 
> > In the current case, a userland application transfers back and forth data 
> > through read/write operations on a file.
> > On the kernel side, it triggers DMA transfers through the PCI to/from our 
> > board memory.
> > 
> > We followed what pretty much all docs said about direct I/O to user buffers:
> > 
> > 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> > 2) convert to a scatterlist
> > 3) pci_map_sg
> > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible)
> > 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma
> > 5) wait for transfer complete, in the mean time, go back to (1) to schedule 
> > more work, if any
> > 6) pci_unmap_sg
> > 7) for read (card2host) transfer, set_page_dirty_lock
> > 8) page_cache_release
> > 
> > In 99,% it works perfectly.
> > However, I have one userland application where a few pages are not written 
> > by a read (card2host) transfer.
> > The buffer is memset them to a different value so I can check that nothing 
> > has overwritten them.
> > 
> > I know (PCI protocol analyser) that the data left our board for the "right" 
> > address (the one set in the sg by pci_map_sg).
> > I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> > using
> > uint32_t *addr = page_address(trans->pages[0]);
> > dev_warn(>pdev->dev, "val = %x\n", *addr);
> > and it has the expected value.
> > But if I try to copy_from_user (using the address coming from userland, the 
> > one passed to get_user_pages), the data has not been written and I see the 
> > memset value.
> > 
> > New infos:
> > 
> > The issue happens with IOMMU on or off.
> > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or 
> > errors.
> > 
> > I digged a little bit deeper with my very small understanding of linux mm 
> > and I discovered that:
> >  * we are using transparent huge pages
> >  * the page 'not transferred' are the last few of a huge page
> > More precisely:
> > - We have several transfer in flight from the same user buffer
> > - Each transfer is 16 pages long
> > - At one point in time, we start transferring from another huge page 
> > (transfers are still in flight from the previous one)
> > - When a transfer from the previous huge page completes, I dumped at the 
> > mapcount of the pages from the previous transfers,
> >   they are all to 0. The pages are still mapped to dma at this point.
> > - A get_user_page to the address of the completed transfer returns return a 
> > different struct page * then the on I had.
> > But this is before I have unmapped/put_page them back. From my 
> > understanding this should not have happened.
> > 
> > I tried the same code with a kernel 4.5 and encountered the same issue
> > 
> > Disabling transparent huge pages makes the issue disapear
> > 
> > Thanks in advance
> 
> It does look to me as if pages are being migrated, despite being pinned
> by get_user_pages(): and that would be wrong.  Originally I intended
> to suggest that THP is probably merely the cause of compaction, with
> compaction causing the page migration.  But you posted very interesting
> details in an earlier mail on 27th April from :
> 
> > I ran some more tests:
> > 
> > * Test is OK if transparent huge tlb are disabled
> > 
> > * For all the page where data are not transfered, and only those pages, a 
> > call to get_user_page(user vaddr) just before dma_unmap_sg returns a 
> > different page from the original one.
> > [436477.927279] mppa :03:00.0: org_page= ea0009f60080 cur page = 
> > ea00074e0080
> > [436477.927298] 

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-03 Thread Jerome Glisse
On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> 
> > Hi everyone,
> > 
> > This is a repost from a different address as it seems the previous one 
> > ended in Gmail junk due to a domain error..
> 
> linux-kernel is a very high volume list which few are reading:
> that also will account for your lack of response so far
> (apart from the indefatigable Alan).
> 
> I've added linux-mm, and some people from another thread regarding
> THP and get_user_pages() pins which has been discussed in recent days.
> 
> Make no mistake, the issue you're raising here is definitely not the
> same as that one (which is specifically about the new THP refcounting
> in v4.5+, whereas you're reporting a problem you've seen in both a
> v3.10-based kernel and in v4.5).  But I think their heads are in
> gear, much more so than mine, and likely to spot something.
> 
> > I added more info found while blindly debugging the issue.
> > 
> > Short version:
> > I'm having an issue with direct DMA transfer from a device to host memory.
> > It seems some of the data is not transferring to the appropriate page.
> > 
> > Some more details:
> > I'm debugging a home made PCI driver for our board (Kalray), attached to a 
> > x86_64 host running centos7 (3.10.0-327.el7.x86_64)
> > 
> > In the current case, a userland application transfers back and forth data 
> > through read/write operations on a file.
> > On the kernel side, it triggers DMA transfers through the PCI to/from our 
> > board memory.
> > 
> > We followed what pretty much all docs said about direct I/O to user buffers:
> > 
> > 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> > 2) convert to a scatterlist
> > 3) pci_map_sg
> > 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible)
> > 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma
> > 5) wait for transfer complete, in the mean time, go back to (1) to schedule 
> > more work, if any
> > 6) pci_unmap_sg
> > 7) for read (card2host) transfer, set_page_dirty_lock
> > 8) page_cache_release
> > 
> > In 99,% it works perfectly.
> > However, I have one userland application where a few pages are not written 
> > by a read (card2host) transfer.
> > The buffer is memset them to a different value so I can check that nothing 
> > has overwritten them.
> > 
> > I know (PCI protocol analyser) that the data left our board for the "right" 
> > address (the one set in the sg by pci_map_sg).
> > I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> > using
> > uint32_t *addr = page_address(trans->pages[0]);
> > dev_warn(>pdev->dev, "val = %x\n", *addr);
> > and it has the expected value.
> > But if I try to copy_from_user (using the address coming from userland, the 
> > one passed to get_user_pages), the data has not been written and I see the 
> > memset value.
> > 
> > New infos:
> > 
> > The issue happens with IOMMU on or off.
> > I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or 
> > errors.
> > 
> > I digged a little bit deeper with my very small understanding of linux mm 
> > and I discovered that:
> >  * we are using transparent huge pages
> >  * the page 'not transferred' are the last few of a huge page
> > More precisely:
> > - We have several transfer in flight from the same user buffer
> > - Each transfer is 16 pages long
> > - At one point in time, we start transferring from another huge page 
> > (transfers are still in flight from the previous one)
> > - When a transfer from the previous huge page completes, I dumped at the 
> > mapcount of the pages from the previous transfers,
> >   they are all to 0. The pages are still mapped to dma at this point.
> > - A get_user_page to the address of the completed transfer returns return a 
> > different struct page * then the on I had.
> > But this is before I have unmapped/put_page them back. From my 
> > understanding this should not have happened.
> > 
> > I tried the same code with a kernel 4.5 and encountered the same issue
> > 
> > Disabling transparent huge pages makes the issue disapear
> > 
> > Thanks in advance
> 
> It does look to me as if pages are being migrated, despite being pinned
> by get_user_pages(): and that would be wrong.  Originally I intended
> to suggest that THP is probably merely the cause of compaction, with
> compaction causing the page migration.  But you posted very interesting
> details in an earlier mail on 27th April from :
> 
> > I ran some more tests:
> > 
> > * Test is OK if transparent huge tlb are disabled
> > 
> > * For all the page where data are not transfered, and only those pages, a 
> > call to get_user_page(user vaddr) just before dma_unmap_sg returns a 
> > different page from the original one.
> > [436477.927279] mppa :03:00.0: org_page= ea0009f60080 cur page = 
> > ea00074e0080
> > [436477.927298] page:ea0009f60080 

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-02 Thread Hugh Dickins
On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:

> Hi everyone,
> 
> This is a repost from a different address as it seems the previous one ended 
> in Gmail junk due to a domain error..

linux-kernel is a very high volume list which few are reading:
that also will account for your lack of response so far
(apart from the indefatigable Alan).

I've added linux-mm, and some people from another thread regarding
THP and get_user_pages() pins which has been discussed in recent days.

Make no mistake, the issue you're raising here is definitely not the
same as that one (which is specifically about the new THP refcounting
in v4.5+, whereas you're reporting a problem you've seen in both a
v3.10-based kernel and in v4.5).  But I think their heads are in
gear, much more so than mine, and likely to spot something.

> I added more info found while blindly debugging the issue.
> 
> Short version:
> I'm having an issue with direct DMA transfer from a device to host memory.
> It seems some of the data is not transferring to the appropriate page.
> 
> Some more details:
> I'm debugging a home made PCI driver for our board (Kalray), attached to a 
> x86_64 host running centos7 (3.10.0-327.el7.x86_64)
> 
> In the current case, a userland application transfers back and forth data 
> through read/write operations on a file.
> On the kernel side, it triggers DMA transfers through the PCI to/from our 
> board memory.
> 
> We followed what pretty much all docs said about direct I/O to user buffers:
> 
> 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> 2) convert to a scatterlist
> 3) pci_map_sg
> 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible)
> 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma
> 5) wait for transfer complete, in the mean time, go back to (1) to schedule 
> more work, if any
> 6) pci_unmap_sg
> 7) for read (card2host) transfer, set_page_dirty_lock
> 8) page_cache_release
> 
> In 99,% it works perfectly.
> However, I have one userland application where a few pages are not written by 
> a read (card2host) transfer.
> The buffer is memset them to a different value so I can check that nothing 
> has overwritten them.
> 
> I know (PCI protocol analyser) that the data left our board for the "right" 
> address (the one set in the sg by pci_map_sg).
> I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> using
> uint32_t *addr = page_address(trans->pages[0]);
> dev_warn(>pdev->dev, "val = %x\n", *addr);
> and it has the expected value.
> But if I try to copy_from_user (using the address coming from userland, the 
> one passed to get_user_pages), the data has not been written and I see the 
> memset value.
> 
> New infos:
> 
> The issue happens with IOMMU on or off.
> I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or errors.
> 
> I digged a little bit deeper with my very small understanding of linux mm and 
> I discovered that:
>  * we are using transparent huge pages
>  * the page 'not transferred' are the last few of a huge page
> More precisely:
> - We have several transfer in flight from the same user buffer
> - Each transfer is 16 pages long
> - At one point in time, we start transferring from another huge page 
> (transfers are still in flight from the previous one)
> - When a transfer from the previous huge page completes, I dumped at the 
> mapcount of the pages from the previous transfers,
>   they are all to 0. The pages are still mapped to dma at this point.
> - A get_user_page to the address of the completed transfer returns return a 
> different struct page * then the on I had.
> But this is before I have unmapped/put_page them back. From my understanding 
> this should not have happened.
> 
> I tried the same code with a kernel 4.5 and encountered the same issue
> 
> Disabling transparent huge pages makes the issue disapear
> 
> Thanks in advance

It does look to me as if pages are being migrated, despite being pinned
by get_user_pages(): and that would be wrong.  Originally I intended
to suggest that THP is probably merely the cause of compaction, with
compaction causing the page migration.  But you posted very interesting
details in an earlier mail on 27th April from :

> I ran some more tests:
> 
> * Test is OK if transparent huge tlb are disabled
> 
> * For all the page where data are not transfered, and only those pages, a 
> call to get_user_page(user vaddr) just before dma_unmap_sg returns a 
> different page from the original one.
> [436477.927279] mppa :03:00.0: org_page= ea0009f60080 cur page = 
> ea00074e0080
> [436477.927298] page:ea0009f60080 count:0 mapcount:1 mapping:  
> (null) index:0x2
> [436477.927314] page flags: 0x2f8000(tail)
> [436477.927354] page dumped because: org_page
> [436477.927369] page:ea00074e0080 count:0 mapcount:1 mapping:  
> (null) index:0x2
> [436477.927382] 

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-02 Thread Hugh Dickins
On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:

> Hi everyone,
> 
> This is a repost from a different address as it seems the previous one ended 
> in Gmail junk due to a domain error..

linux-kernel is a very high volume list which few are reading:
that also will account for your lack of response so far
(apart from the indefatigable Alan).

I've added linux-mm, and some people from another thread regarding
THP and get_user_pages() pins which has been discussed in recent days.

Make no mistake, the issue you're raising here is definitely not the
same as that one (which is specifically about the new THP refcounting
in v4.5+, whereas you're reporting a problem you've seen in both a
v3.10-based kernel and in v4.5).  But I think their heads are in
gear, much more so than mine, and likely to spot something.

> I added more info found while blindly debugging the issue.
> 
> Short version:
> I'm having an issue with direct DMA transfer from a device to host memory.
> It seems some of the data is not transferring to the appropriate page.
> 
> Some more details:
> I'm debugging a home made PCI driver for our board (Kalray), attached to a 
> x86_64 host running centos7 (3.10.0-327.el7.x86_64)
> 
> In the current case, a userland application transfers back and forth data 
> through read/write operations on a file.
> On the kernel side, it triggers DMA transfers through the PCI to/from our 
> board memory.
> 
> We followed what pretty much all docs said about direct I/O to user buffers:
> 
> 1) get_user_pages() (in the current case, it's at most 16 pages at once)
> 2) convert to a scatterlist
> 3) pci_map_sg
> 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible)
> 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma
> 5) wait for transfer complete, in the mean time, go back to (1) to schedule 
> more work, if any
> 6) pci_unmap_sg
> 7) for read (card2host) transfer, set_page_dirty_lock
> 8) page_cache_release
> 
> In 99,% it works perfectly.
> However, I have one userland application where a few pages are not written by 
> a read (card2host) transfer.
> The buffer is memset them to a different value so I can check that nothing 
> has overwritten them.
> 
> I know (PCI protocol analyser) that the data left our board for the "right" 
> address (the one set in the sg by pci_map_sg).
> I tried reading the data between the pci_unmap_sg and the set_page_dirty, 
> using
> uint32_t *addr = page_address(trans->pages[0]);
> dev_warn(>pdev->dev, "val = %x\n", *addr);
> and it has the expected value.
> But if I try to copy_from_user (using the address coming from userland, the 
> one passed to get_user_pages), the data has not been written and I see the 
> memset value.
> 
> New infos:
> 
> The issue happens with IOMMU on or off.
> I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or errors.
> 
> I digged a little bit deeper with my very small understanding of linux mm and 
> I discovered that:
>  * we are using transparent huge pages
>  * the page 'not transferred' are the last few of a huge page
> More precisely:
> - We have several transfer in flight from the same user buffer
> - Each transfer is 16 pages long
> - At one point in time, we start transferring from another huge page 
> (transfers are still in flight from the previous one)
> - When a transfer from the previous huge page completes, I dumped at the 
> mapcount of the pages from the previous transfers,
>   they are all to 0. The pages are still mapped to dma at this point.
> - A get_user_page to the address of the completed transfer returns return a 
> different struct page * then the on I had.
> But this is before I have unmapped/put_page them back. From my understanding 
> this should not have happened.
> 
> I tried the same code with a kernel 4.5 and encountered the same issue
> 
> Disabling transparent huge pages makes the issue disapear
> 
> Thanks in advance

It does look to me as if pages are being migrated, despite being pinned
by get_user_pages(): and that would be wrong.  Originally I intended
to suggest that THP is probably merely the cause of compaction, with
compaction causing the page migration.  But you posted very interesting
details in an earlier mail on 27th April from :

> I ran some more tests:
> 
> * Test is OK if transparent huge tlb are disabled
> 
> * For all the page where data are not transfered, and only those pages, a 
> call to get_user_page(user vaddr) just before dma_unmap_sg returns a 
> different page from the original one.
> [436477.927279] mppa :03:00.0: org_page= ea0009f60080 cur page = 
> ea00074e0080
> [436477.927298] page:ea0009f60080 count:0 mapcount:1 mapping:  
> (null) index:0x2
> [436477.927314] page flags: 0x2f8000(tail)
> [436477.927354] page dumped because: org_page
> [436477.927369] page:ea00074e0080 count:0 mapcount:1 mapping:  
> (null) index:0x2
> [436477.927382] page flags: 

[Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-04-29 Thread Nicolas Morey Chaisemartin
Hi everyone,

This is a repost from a different address as it seems the previous one ended in 
Gmail junk due to a domain error..
I added more info found while blindly debugging the issue.

Short version:
I'm having an issue with direct DMA transfer from a device to host memory.
It seems some of the data is not transferring to the appropriate page.

Some more details:
I'm debugging a home made PCI driver for our board (Kalray), attached to a 
x86_64 host running centos7 (3.10.0-327.el7.x86_64)

In the current case, a userland application transfers back and forth data 
through read/write operations on a file.
On the kernel side, it triggers DMA transfers through the PCI to/from our board 
memory.

We followed what pretty much all docs said about direct I/O to user buffers:

1) get_user_pages() (in the current case, it's at most 16 pages at once)
2) convert to a scatterlist
3) pci_map_sg
4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible)
4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma
5) wait for transfer complete, in the mean time, go back to (1) to schedule 
more work, if any
6) pci_unmap_sg
7) for read (card2host) transfer, set_page_dirty_lock
8) page_cache_release

In 99,% it works perfectly.
However, I have one userland application where a few pages are not written by a 
read (card2host) transfer.
The buffer is memset them to a different value so I can check that nothing has 
overwritten them.

I know (PCI protocol analyser) that the data left our board for the "right" 
address (the one set in the sg by pci_map_sg).
I tried reading the data between the pci_unmap_sg and the set_page_dirty, using
uint32_t *addr = page_address(trans->pages[0]);
dev_warn(>pdev->dev, "val = %x\n", *addr);
and it has the expected value.
But if I try to copy_from_user (using the address coming from userland, the one 
passed to get_user_pages), the data has not been written and I see the memset 
value.

New infos:

The issue happens with IOMMU on or off.
I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or errors.

I digged a little bit deeper with my very small understanding of linux mm and I 
discovered that:
 * we are using transparent huge pages
 * the page 'not transferred' are the last few of a huge page
More precisely:
- We have several transfer in flight from the same user buffer
- Each transfer is 16 pages long
- At one point in time, we start transferring from another huge page (transfers 
are still in flight from the previous one)
- When a transfer from the previous huge page completes, I dumped at the 
mapcount of the pages from the previous transfers,
  they are all to 0. The pages are still mapped to dma at this point.
- A get_user_page to the address of the completed transfer returns return a 
different struct page * then the on I had.
But this is before I have unmapped/put_page them back. From my understanding 
this should not have happened.

I tried the same code with a kernel 4.5 and encountered the same issue

Disabling transparent huge pages makes the issue disapear

Thanks in advance

Nicolas



[Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-04-29 Thread Nicolas Morey Chaisemartin
Hi everyone,

This is a repost from a different address as it seems the previous one ended in 
Gmail junk due to a domain error..
I added more info found while blindly debugging the issue.

Short version:
I'm having an issue with direct DMA transfer from a device to host memory.
It seems some of the data is not transferring to the appropriate page.

Some more details:
I'm debugging a home made PCI driver for our board (Kalray), attached to a 
x86_64 host running centos7 (3.10.0-327.el7.x86_64)

In the current case, a userland application transfers back and forth data 
through read/write operations on a file.
On the kernel side, it triggers DMA transfers through the PCI to/from our board 
memory.

We followed what pretty much all docs said about direct I/O to user buffers:

1) get_user_pages() (in the current case, it's at most 16 pages at once)
2) convert to a scatterlist
3) pci_map_sg
4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible)
4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma
5) wait for transfer complete, in the mean time, go back to (1) to schedule 
more work, if any
6) pci_unmap_sg
7) for read (card2host) transfer, set_page_dirty_lock
8) page_cache_release

In 99,% it works perfectly.
However, I have one userland application where a few pages are not written by a 
read (card2host) transfer.
The buffer is memset them to a different value so I can check that nothing has 
overwritten them.

I know (PCI protocol analyser) that the data left our board for the "right" 
address (the one set in the sg by pci_map_sg).
I tried reading the data between the pci_unmap_sg and the set_page_dirty, using
uint32_t *addr = page_address(trans->pages[0]);
dev_warn(>pdev->dev, "val = %x\n", *addr);
and it has the expected value.
But if I try to copy_from_user (using the address coming from userland, the one 
passed to get_user_pages), the data has not been written and I see the memset 
value.

New infos:

The issue happens with IOMMU on or off.
I compiled a kernel with DMA_API_DEBUG enabled and got no warnings or errors.

I digged a little bit deeper with my very small understanding of linux mm and I 
discovered that:
 * we are using transparent huge pages
 * the page 'not transferred' are the last few of a huge page
More precisely:
- We have several transfer in flight from the same user buffer
- Each transfer is 16 pages long
- At one point in time, we start transferring from another huge page (transfers 
are still in flight from the previous one)
- When a transfer from the previous huge page completes, I dumped at the 
mapcount of the pages from the previous transfers,
  they are all to 0. The pages are still mapped to dma at this point.
- A get_user_page to the address of the completed transfer returns return a 
different struct page * then the on I had.
But this is before I have unmapped/put_page them back. From my understanding 
this should not have happened.

I tried the same code with a kernel 4.5 and encountered the same issue

Disabling transparent huge pages makes the issue disapear

Thanks in advance

Nicolas