date:20121121

Re: [PATCH RESEND] acpi: Fix logging when no pci_irq is allocated

2012-11-21 Thread Rafael J. Wysocki

On Wednesday, November 21, 2012 12:53:55 PM Joe Perches wrote:
> On Wed, 2012-11-21 at 21:50 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 21, 2012 05:46:04 AM Joe Perches wrote:
> > > On Wed, 2012-11-21 at 16:43 +0800, Daniel J Blueman wrote:
> > > > Previously a new line is implicitly added in the no GSI case:
> > > > 
> > > > [7.185182] pci 0001:00:12.0: can't derive routing for PCI INT A
> > > > [7.191352] pci 0001:00:12.0: PCI INT A: no GSI
> > > > [7.195956]  - using ISA IRQ 10
> > > > 
> > > > The code thus prints a blank line where no legacy IRQ is available:
> > > > 
> > > > [1.650124] pci :00:14.0: can't derive routing for PCI INT A
> > > > [1.650126] pci :00:14.0: PCI INT A: no GSI
> > > > [1.650126] 
> > > > [1.650180] pci :00:14.0: can't derive routing for PCI INT A
> > > > 
> > > > Fix this by making the newline explicit and removing the superfluous
> > > > one.
> > > 
> > > This breaks the logging code below it when there is an ISA irq.
> > > 
> > > The below works, but is a workaround for a defect in the printk
> > > subsystem introduced by a logging change that will be fixed in
> > > a near future release.
> > 
> > What exactly do you mean by "near future"?
> 
> I mean Jan Schönherr's patches that should fix this are
> likely to be picked up one day.
> 
> https://lkml.org/lkml/2012/11/13/678

Till then, we need the patch you sent, right?  And it won't hurt to apply it
anyway?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] tty: Add driver unthrottle in ioctl(...,TCFLSH,..).

2012-11-21 Thread Ilya Zykov

Revert 'tty: fix "IRQ45: nobody cared"'

This revert commit 7b292b4bf9a9d6098440d85616d6ca4c608b8304

Function reset_buffer_flags() also invoked during the
ioctl(...,TCFLSH,..). At the time of request we can have full buffers
and throttled driver too. If we don't unthrottle driver, we can get
forever throttled driver, because after request, we will have
empty buffers and throttled driver and there is no place to unthrottle driver.
It simple reproduce with "pty" pair then one side sleep on tty->write_wait,
and other side do ioctl(...,TCFLSH,..). Then there is no place to do writers 
wake up.

About 'tty: fix "IRQ45: nobody cared"':
We don't call tty_unthrottle() if release last filp - ('tty->count == 0')
In other case it must be safely. Maybe we have bug in other place.

Signed-off-by: Ilya Zykov 
---

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 26f0d0e..a783d0e 100644

--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -184,6 +184,7 @@ static void reset_buffer_flags(struct tty_struct *tty)
tty->canon_head = tty->canon_data = tty->erasing = 0;
memset(>read_flags, 0, sizeof tty->read_flags);
n_tty_set_room(tty);
+   check_unthrottle(tty);
 }
 
 /**
@@ -1585,7 +1586,6 @@ static int n_tty_open(struct tty_struct *tty)
return -ENOMEM;
}
reset_buffer_flags(tty);
-   tty_unthrottle(tty);
tty->column = 0;
n_tty_set_termios(tty, NULL);
tty->minimum_to_wake = 1;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] TPM: Work around buggy TPMs that block during continue self test

2012-11-21 Thread Jason Gunthorpe

We've been testing an alternative TPM for our embedded products and
found random kernel boot failures due to time outs after the continue
self test command.

This was happening randomly, and has been *very* hard to track down, but it
looks like with this chip there is some kind of race with the tpm_tis_status()
check of TPM_STS_COMMAND_READY. If things get there 'too fast' then
it sees the chip is ready, or tpm_tis_ready() works. Otherwise it takes
somewhere over 400ms before the chip will return TPM_STS_COMMAND_READY.

Adding some delay after tpm_continue_selftest() makes things reliably
hit the failure path, otherwise it is a crapshot.

The spec says it should be returning TPM_WARN_DOING_SELFTEST, not holding
off on ready..

Boot log during this event looks like this:

tpm_tis 7003.tpm_tis: 1.2 TPM (device-id 0x3204, rev-id 64)
tpm_tis 7003.tpm_tis: Issuing TPM_STARTUP
tpm_tis 7003.tpm_tis: tpm_transmit: tpm_send: error -62
tpm_tis 7003.tpm_tis: [Hardware Error]: TPM command timed out during 
continue self test
tpm_tis 7003.tpm_tis: tpm_transmit: tpm_send: error -62
tpm_tis 7003.tpm_tis: [Hardware Error]: TPM command timed out during 
continue self test
tpm_tis 7003.tpm_tis: tpm_transmit: tpm_send: error -62
tpm_tis 7003.tpm_tis: [Hardware Error]: TPM command timed out during 
continue self test
tpm_tis 7003.tpm_tis: tpm_transmit: tpm_send: error -62
tpm_tis 7003.tpm_tis: [Hardware Error]: TPM command timed out during 
continue self test

The other TPM vendor we use doesn't show this wonky behaviour:
tpm_tis 7003.tpm_tis: 1.2 TPM (device-id 0xFE, rev-id 70)

Signed-off-by: Jason Gunthorpe 
---
 drivers/char/tpm/tpm.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/char/tpm/tpm.c b/drivers/char/tpm/tpm.c
index 454e032..61d62c2 100644
--- a/drivers/char/tpm/tpm.c
+++ b/drivers/char/tpm/tpm.c
@@ -857,7 +857,7 @@ int tpm_do_selftest(struct tpm_chip *chip)
 {
int rc;
unsigned int loops;
-   unsigned int delay_msec = 1000;
+   unsigned int delay_msec = 100;
unsigned long duration;
struct tpm_cmd_t cmd;
 
@@ -878,6 +878,14 @@ int tpm_do_selftest(struct tpm_chip *chip)
cmd.header.in = pcrread_header;
cmd.params.pcrread_in.pcr_idx = cpu_to_be32(0);
rc = tpm_transmit(chip, (u8 *) , READ_PCR_RESULT_SIZE);
+   /* Some buggy TPMs will not respond to tpm_tis_ready() for
+* around 300ms while the self test is ongoing, keep trying
+* until the self test duration expires. */
+   if (rc == -ETIME) {
+   dev_info(chip->dev, HW_ERR "TPM command timed out 
during continue self test");
+   msleep(delay_msec);
+   continue;
+   }
 
if (rc < TPM_HEADER_SIZE)
return -EFAULT;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] ext3: Warn if mounting rw on a disk requiring stable page writes

2012-11-21 Thread Darrick J. Wong

On Wed, Nov 21, 2012 at 03:15:43AM +0100, Jan Kara wrote:
> On Tue 20-11-12 18:00:56, Darrick J. Wong wrote:
> > ext3 doesn't properly isolate pages from changes during writeback.  Since 
> > the
> > recommended fix is to use ext4, for now we'll just print a warning if the 
> > user
> > tries to mount in write mode.
> > 
> > Signed-off-by: Darrick J. Wong 
> > ---
> >  fs/ext3/super.c |8 
> >  1 file changed, 8 insertions(+)
> > 
> > 
> > diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> > index 5366393..5b3725d 100644
> > --- a/fs/ext3/super.c
> > +++ b/fs/ext3/super.c
> > @@ -1325,6 +1325,14 @@ static int ext3_setup_super(struct super_block *sb, 
> > struct ext3_super_block *es,
> > "forcing read-only mode");
> > res = MS_RDONLY;
> > }
> > +   if (!read_only &&
> > +   queue_requires_stable_pages(bdev_get_queue(sb->s_bdev))) {
> > +   ext3_msg(sb, KERN_ERR,
> > +   "error: ext3 cannot safely write data to a disk "
> > +   "requiring stable pages writes; forcing read-only "
> > +   "mode.  Upgrading to ext4 is recommended.");
> > +   res = MS_RDONLY;
> > +   }
> > if (read_only)
> > return res;
> > if (!(sbi->s_mount_state & EXT3_VALID_FS))
>   Why this? ext3 should be fixed by your change to
> filemap_page_mkwrite()... Or does testing show otherwise?

Yes, it's still broken even with this new set of changes.  Now that I think
about it a little more, I recall that writeback mode was actually fine, so this
is a little harsh.

Hm... looking at the ordered code a little more, it looks like
ext3_ordered_write_end is calling journal_dirty_data_fn, which (I guess?) tries
to write mapped buffers back through the journal?  Taking it out seems to fix
ordered mode, though I have a suspicion that it might very well break ordered
mode too.

--D
> 
>   Honza
> -- 
> Jan Kara 
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Pv-drivers] [PATCH 01/12] VMCI: context implementation.

2012-11-21 Thread Andy King

Hi Joe,

> Just some trivial notes.

Thanks for taking a look!

> > +   pr_warn("Failed to allocate memory for VMCI context.\n");
> 
> OOM logging messages aren't necessary as alloc failures
> are already logged with a stack trace.

Noted, we'll remove all such occurrences.

> Maybe just use
>   struct vmci_event_msg e_msg;
>   struct vmci_event_payld_ctx ev_payload;
> and change the addressing or use a cast as appropriate?

It does seem inelegant, we'll take a look.

> You also have some inconsistency in whether or not your
> logging messages use a terminating period.  I suggest
> you just delete all the periods.
>   s/\.\\n"/\\n"/g

Gah, that's ugly.  We'll remove all of them as you suggest.

Thanks!
- Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: vmscan: Check for fatal signals iff the process was throttled

2012-11-21 Thread Mel Gorman

On Wed, Nov 21, 2012 at 12:15:59PM -0800, Andrew Morton wrote:
> On Wed, 21 Nov 2012 15:38:24 +
> Mel Gorman  wrote:
> 
> > commit 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves
> > are low and swap is backed by network storage") introduced a check for
> > fatal signals after a process gets throttled for network storage. The
> > intention was that if a process was throttled and got killed that it
> > should not trigger the OOM killer. As pointed out by Minchan Kim and
> > David Rientjes, this check is in the wrong place and too broad. If a
> > system is in am OOM situation and a process is exiting, it can loop in
> > __alloc_pages_slowpath() and calling direct reclaim in a loop. As the
> > fatal signal is pending it returns 1 as if it is making forward progress
> > and can effectively deadlock.
> > 
> > This patch moves the fatal_signal_pending() check after throttling to
> > throttle_direct_reclaim() where it belongs. If the process is killed
> > while throttled, it will return immediately without direct reclaim
> > except now it will have TIF_MEMDIE set and will use the PFMEMALLOC
> > reserves.
> > 
> > Minchan pointed out that it may be better to direct reclaim before returning
> > to avoid using the reserves because there may be pages that can easily
> > reclaim that would avoid using the reserves. However, we do no such 
> > targetted
> > reclaim and there is no guarantee that suitable pages are available. As it
> > is expected that this throttling happens when swap-over-NFS is used there
> > is a possibility that the process will instead swap which may allocate
> > network buffers from the PFMEMALLOC reserves. Hence, in the swap-over-nfs
> > case where a process can be throtted and be killed it can use the reserves
> > to exit or it can potentially use reserves to swap a few pages and then
> > exit. This patch takes the option of using the reserves if necessary to
> > allow the process exit quickly.
> > 
> > If this patch passes review it should be considered a -stable candidate
> > for 3.6.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2207,9 +2207,12 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
> >   * Throttle direct reclaimers if backing storage is backed by the network
> >   * and the PFMEMALLOC reserve for the preferred node is getting dangerously
> >   * depleted. kswapd will continue to make progress and wake the processes
> > - * when the low watermark is reached
> > + * when the low watermark is reached.
> > + *
> > + * Returns true if a fatal signal was delivered during throttling. If this
> 
> s/delivered/received/imo
> 

Ok.

> > + * happens, the page allocator should not consider triggering the OOM 
> > killer.
> >   */
> > -static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist 
> > *zonelist,
> > +static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist 
> > *zonelist,
> > nodemask_t *nodemask)
> >  {
> > struct zone *zone;
> > @@ -2224,13 +2227,20 @@ static void throttle_direct_reclaim(gfp_t gfp_mask, 
> > struct zonelist *zonelist,
> >  * processes to block on log_wait_commit().
> >  */
> > if (current->flags & PF_KTHREAD)
> > -   return;
> > +   goto out;
> 
> hm, well, back in the old days some kernel threads were killable via
> signals.  They had to opt-in to it by diddling their signal masks and a
> few other things.  Too lazy to check if there are still any such sites.
> 

That check is against throttling rather than signal handling though. It
could have been just left as "return".

> 
> > +   /*
> > +* If a fatal signal is pending, this process should not throttle.
> > +* It should return quickly so it can exit and free its memory
> > +*/
> > +   if (fatal_signal_pending(current))
> > +   goto out;
> 
> theresabug.  It should return "true" here.
> 

The intention here is that a process would

1. allocate, fail, enter direct reclaim
2. no signal pending, gets throttled because of low pfmemalloc reserves
3. a user kills -9 the throttled process. returns true and goes back
   to the page allocator
4. If that allocation fails again, it re-enters direct reclaim and tries
   to throttle. This time the fatal signal is pending but we know
   we must have already failed to make the allocation so this time false
   is rurned by throttle_direct_reclaim and it tries direct reclaim.
5. direct reclaim frees something -- probably clean file-backed pages
   if the last allocation attempt had failed.

so the fatal signal check should only prevent entering direct reclaim
once. Maybe the comment sucks

/*
 * If a fatal signal is pending, this process should not throttle.
 * It should return quickly so it can exit and free its memory. Note
 * that returning false here allows a process to enter direct reclaim.
 * Otherwise there is a risk that the process loops in the page
 * allocator, checking signals and never making

Re: [PATCH 01/12] VMCI: context implementation.

2012-11-21 Thread Joe Perches

On Wed, 2012-11-21 at 12:31 -0800, George Zhang wrote:
> VMCI Context code maintains state for vmci and allows the driver to 
> communicate
> with multiple VMs

Just some trivial notes.

> diff --git a/drivers/misc/vmw_vmci/vmci_context.c 
> b/drivers/misc/vmw_vmci/vmci_context.c
[]

It'd be nicer if you added this #define before any #include
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
so that pr_ messages are prefixed.
(never mind, found a similar macro in patch 12/12)

> +#include 
> +#include 
[]
> + context = kzalloc(sizeof(*context), GFP_KERNEL);
> + if (!context) {
> + pr_warn("Failed to allocate memory for VMCI context.\n");

OOM logging messages aren't necessary as alloc failures
are already logged with a stack trace.

That goes for the entire patch series.

> + /* Fire event to all subscribers. */
> + array_size = vmci_handle_arr_get_size(subscriber_array);
> + for (i = 0; i < array_size; i++) {
> + int result;
> + struct vmci_event_msg *e_msg;
> + struct vmci_event_payld_ctx *ev_payload;
> + char buf[sizeof(*e_msg) + sizeof(*ev_payload)];

Maybe just use
struct vmci_event_msg e_msg;
struct vmci_event_payld_ctx ev_payload;
and change the addressing or use a cast as appropriate?

> + /* Allocate guest call entry and add it to the target VM's queue. */
> + dq_entry = kmalloc(sizeof(*dq_entry), GFP_KERNEL);
> + if (dq_entry == NULL) {
> + pr_warn("Failed to allocate memory for datagram.\n");

Another unnecessary OOM message.

You also have some inconsistency in whether or not your
logging messages use a terminating period.  I suggest
you just delete all the periods.
s/\.\\n"/\\n"/g

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] of: When constructing the bus id consider assigned-addresses as well

2012-11-21 Thread Jason Gunthorpe

'assigned-addresses' is used for certain PCI device type nodes in
lieu of 'reg',  since this is enforced by of/address.c, have
of_device_make_bus_id look there as well.

Signed-off-by: Jason Gunthorpe 
---
 drivers/of/platform.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

of_can_translate_address and of_translate_address already support
using assigned-addresses.

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index b80891b..4f0f701 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -105,6 +105,8 @@ void of_device_make_bus_id(struct device *dev)
 * For MMIO, get the physical address
 */
reg = of_get_property(node, "reg", NULL);
+   if (!reg)
+   reg = of_get_property(node, "assigned-addresses", NULL);
if (reg) {
if (of_can_translate_address(node)) {
addr = of_translate_address(node, reg);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/12] Media Controller capture driver for DM365

2012-11-21 Thread Sakari Ailus

Hi Prabhakar,

On Fri, Nov 16, 2012 at 08:15:02PM +0530, Prabhakar Lad wrote:
> From: Manjunath Hadli 
> 
> This patch set adds media controller based capture driver for
> DM365.
> 
> This driver bases its design on Laurent Pinchart's Media Controller Design
> whose patches for Media Controller and subdev enhancements form the base.
> The driver also takes copious elements taken from Laurent Pinchart and
> others' OMAP ISP driver based on Media Controller. So thank you all the
> people who are responsible for the Media Controller and the OMAP ISP driver.
> 
> Also, the core functionality of the driver comes from the arago vpfe capture
> driver of which the isif capture was based on V4L2, with other drivers like
> ipipe, ipipeif and Resizer.
> 
> Changes for v2:
> 1: Migrated the driver for videobuf2 usage pointed Hans.
> 2: Changed the design as pointed by Laurent, Exposed one more subdevs
>ipipeif and split the resizer subdev into three subdevs.
> 3: Rearrganed the patch sequence and changed the commit messages.
> 4: Changed the file architecture as pointed by Laurent.
> 
> Manjunath Hadli (12):
>   davinci: vpfe: add v4l2 capture driver with media interface
>   davinci: vpfe: add v4l2 video driver support
>   davinci: vpfe: dm365: add IPIPEIF driver based on media framework
>   davinci: vpfe: dm365: add ISIF driver based on media framework
>   davinci: vpfe: dm365: add IPIPE support for media controller driver
>   davinci: vpfe: dm365: add IPIPE hardware layer support
>   davinci: vpfe: dm365: resizer driver based on media framework
>   davinci: vpss: dm365: enable ISP registers
>   davinci: vpss: dm365: set vpss clk ctrl
>   davinci: vpss: dm365: add vpss helper functions to be used in the
> main driver for setting hardware parameters
>   davinci: vpfe: dm365: add build infrastructure for capture driver
>   davinci: vpfe: Add documentation

Many thanks for taking the driver this far!

However, I feel that there's still some work to do, especially in the user
space API. Some things could be implemented using the generic API but
currently use davinci-specific API; private IOCTL is being used where
controls would do, and resizing is enabled or disable explicitly in ipipeif
configuration. Also, there are things such as internal clock frequencies
visible in the API.

I can go to more details soon after taking a closer look at the patches.

If you wish to get this to mainline kernel fast, a viable option IMO would
be the staging tree.

What do you think?

Cc Hans and Laurent.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ai...@iki.fi XMPP: sai...@retiisi.org.uk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] TPM: Switch to packed instead of attribute__((packed))

2012-11-21 Thread Jason Gunthorpe

This seems to be preferred these days.

Signed-off-by: Jason Gunthorpe 
---
 drivers/char/tpm/tpm.h |   34 +-
 1 files changed, 17 insertions(+), 17 deletions(-)

As discussed with Peter.

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index c20fa8d..7d05ced 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -157,13 +157,13 @@ struct tpm_input_header {
__be16  tag;
__be32  length;
__be32  ordinal;
-}__attribute__((packed));
+} __packed;
 
 struct tpm_output_header {
__be16  tag;
__be32  length;
__be32  return_code;
-}__attribute__((packed));
+} __packed;
 
 struct stclear_flags_t {
__be16  tag;
@@ -172,14 +172,14 @@ structstclear_flags_t {
u8  physicalPresence;
u8  physicalPresenceLock;
u8  bGlobalLock;
-}__attribute__((packed));
+} __packed;
 
 struct tpm_version_t {
u8  Major;
u8  Minor;
u8  revMajor;
u8  revMinor;
-}__attribute__((packed));
+} __packed;
 
 struct tpm_version_1_2_t {
__be16  tag;
@@ -187,20 +187,20 @@ structtpm_version_1_2_t {
u8  Minor;
u8  revMajor;
u8  revMinor;
-}__attribute__((packed));
+} __packed;
 
 struct timeout_t {
__be32  a;
__be32  b;
__be32  c;
__be32  d;
-}__attribute__((packed));
+} __packed;
 
 struct duration_t {
__be32  tpm_short;
__be32  tpm_medium;
__be32  tpm_long;
-}__attribute__((packed));
+} __packed;
 
 struct permanent_flags_t {
__be16  tag;
@@ -224,7 +224,7 @@ struct permanent_flags_t {
u8  tpmEstablished;
u8  maintenanceDone;
u8  disableFullDALogicInfo;
-}__attribute__((packed));
+} __packed;
 
 typedef union {
struct  permanent_flags_t perm_flags;
@@ -242,12 +242,12 @@ structtpm_getcap_params_in {
__be32  cap;
__be32  subcap_size;
__be32  subcap;
-}__attribute__((packed));
+} __packed;
 
 struct tpm_getcap_params_out {
__be32  cap_size;
cap_t   cap;
-}__attribute__((packed));
+} __packed;
 
 struct tpm_readpubek_params_out {
u8  algorithm[4];
@@ -258,7 +258,7 @@ struct  tpm_readpubek_params_out {
__be32  keysize;
u8  modulus[256];
u8  checksum[20];
-}__attribute__((packed));
+} __packed;
 
 typedef union {
struct  tpm_input_header in;
@@ -268,16 +268,16 @@ typedef union {
 #define TPM_DIGEST_SIZE 20
 struct tpm_pcrread_out {
u8  pcr_result[TPM_DIGEST_SIZE];
-}__attribute__((packed));
+} __packed;
 
 struct tpm_pcrread_in {
__be32  pcr_idx;
-}__attribute__((packed));
+} __packed;
 
 struct tpm_pcrextend_in {
__be32  pcr_idx;
u8  hash[TPM_DIGEST_SIZE];
-}__attribute__((packed));
+} __packed;
 
 /* 128 bytes is an arbitrary cap. This could be as large as TPM_BUFSIZE - 18
  * bytes, but 128 is still a relatively large number of random bytes and
@@ -288,11 +288,11 @@ struct tpm_pcrextend_in {
 struct tpm_getrandom_out {
__be32 rng_data_len;
u8 rng_data[TPM_MAX_RNG_DATA];
-}__attribute__((packed));
+} __packed;
 
 struct tpm_getrandom_in {
__be32 num_bytes;
-}__attribute__((packed));
+} __packed;
 
 struct tpm_startup_in {
__be16  startup_type;
@@ -314,7 +314,7 @@ typedef union {
 struct tpm_cmd_t {
tpm_cmd_header  header;
tpm_cmd_params  params;
-}__attribute__((packed));
+} __packed;
 
 ssize_ttpm_getcap(struct device *, __be32, cap_t *, const char *);
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5] TPM: Issue TPM_STARTUP at driver load if the TPM has not been started

2012-11-21 Thread Jason Gunthorpe

The TPM will respond to TPM_GET_CAP with TPM_ERR_INVALID_POSTINIT if
TPM_STARTUP has not been issued. Detect this and automatically
issue TPM_STARTUP.

This is for embedded applications where the kernel is the first thing
to touch the TPM.

Signed-off-by: Jason Gunthorpe 
Tested-by: Peter Huewe 
Reviewed-by: Peter Huewe 
---
 drivers/char/tpm/tpm.c |   44 
 drivers/char/tpm/tpm.h |6 ++
 2 files changed, 46 insertions(+), 4 deletions(-)

v5 changes:
 - Use %zd for printing ssize_t

diff --git a/drivers/char/tpm/tpm.c b/drivers/char/tpm/tpm.c
index 93211df..433ad6b 100644
--- a/drivers/char/tpm/tpm.c
+++ b/drivers/char/tpm/tpm.c
@@ -468,7 +468,7 @@ static ssize_t transmit_cmd(struct tpm_chip *chip, struct 
tpm_cmd_t *cmd,
return -EFAULT;
 
err = be32_to_cpu(cmd->header.out.return_code);
-   if (err != 0)
+   if (err != 0 && desc)
dev_err(chip->dev, "A TPM error (%d) occurred %s\n", err, desc);
 
return err;
@@ -528,6 +528,25 @@ void tpm_gen_interrupt(struct tpm_chip *chip)
 }
 EXPORT_SYMBOL_GPL(tpm_gen_interrupt);
 
+#define TPM_ORD_STARTUP cpu_to_be32(153)
+#define TPM_ST_CLEAR cpu_to_be16(1)
+#define TPM_ST_STATE cpu_to_be16(2)
+#define TPM_ST_DEACTIVATED cpu_to_be16(3)
+static const struct tpm_input_header tpm_startup_header = {
+   .tag = TPM_TAG_RQU_COMMAND,
+   .length = cpu_to_be32(12),
+   .ordinal = TPM_ORD_STARTUP
+};
+
+static int tpm_startup(struct tpm_chip *chip, __be16 startup_type)
+{
+   struct tpm_cmd_t start_cmd;
+   start_cmd.header.in = tpm_startup_header;
+   start_cmd.params.startup_in.startup_type = startup_type;
+   return transmit_cmd(chip, _cmd, TPM_INTERNAL_RESULT_SIZE,
+   "attempting to start the TPM");
+}
+
 int tpm_get_timeouts(struct tpm_chip *chip)
 {
struct tpm_cmd_t tpm_cmd;
@@ -541,11 +560,28 @@ int tpm_get_timeouts(struct tpm_chip *chip)
tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP;
tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4);
tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT;
+   rc = transmit_cmd(chip, _cmd, TPM_INTERNAL_RESULT_SIZE, NULL);
 
-   rc = transmit_cmd(chip, _cmd, TPM_INTERNAL_RESULT_SIZE,
-   "attempting to determine the timeouts");
-   if (rc)
+   if (rc == TPM_ERR_INVALID_POSTINIT) {
+   /* The TPM is not started, we are the first to talk to it.
+  Execute a startup command. */
+   dev_info(chip->dev, "Issuing TPM_STARTUP");
+   if (tpm_startup(chip, TPM_ST_CLEAR))
+   return rc;
+
+   tpm_cmd.header.in = tpm_getcap_header;
+   tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP;
+   tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4);
+   tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT;
+   rc = transmit_cmd(chip, _cmd, TPM_INTERNAL_RESULT_SIZE,
+ NULL);
+   }
+   if (rc) {
+   dev_err(chip->dev,
+   "A TPM error (%zd) occurred attempting to determine the 
timeouts\n",
+   rc);
goto duration;
+   }
 
if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 ||
be32_to_cpu(tpm_cmd.header.out.length)
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 8ef7649..8971b12 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -47,6 +47,7 @@ enum tpm_addr {
 #define TPM_WARN_DOING_SELFTEST 0x802
 #define TPM_ERR_DEACTIVATED 0x6
 #define TPM_ERR_DISABLED0x7
+#define TPM_ERR_INVALID_POSTINIT 38
 
 #define TPM_HEADER_SIZE10
 extern ssize_t tpm_show_pubek(struct device *, struct device_attribute *attr,
@@ -291,6 +292,10 @@ struct tpm_getrandom_in {
__be32 num_bytes;
 }__attribute__((packed));
 
+struct tpm_startup_in {
+   __be16  startup_type;
+} __packed;
+
 typedef union {
struct  tpm_getcap_params_out getcap_out;
struct  tpm_readpubek_params_out readpubek_out;
@@ -301,6 +306,7 @@ typedef union {
struct  tpm_pcrextend_in pcrextend_in;
struct  tpm_getrandom_in getrandom_in;
struct  tpm_getrandom_out getrandom_out;
+   struct tpm_startup_in startup_in;
 } tpm_cmd_params;
 
 struct tpm_cmd_t {
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] CLK: uninline clk_prepare() and clk_unprepare()

2012-11-21 Thread Dmitry Torokhov

On Wed, Nov 21, 2012 at 12:43:24PM -0800, Mike Turquette wrote:
> Quoting Viresh Kumar (2012-11-20 02:13:55)
> > On 20 November 2012 14:52, Dmitry Torokhov  
> > wrote:
> > > We'll need to invoke clk_unprepare() via a pointer in our devm_*
> > > conversion so let's uninline the pair.
> > 
> > Sorry, but you aren't doing this :(
> > This routine is already uninlined as it is in clk.c
> > 
> > Instead you are just moving clk_prepare(), etc calls within
> > #ifdef CONFIG_HAVE_CLK
> > #else
> > #endif
> > 
> > I doubt why they have been added under #ifdef CONFIG_HAVE_CLK_PREPARE
> > earlier. Can they exist without CONFIG_HAVE_CLK
> > 
> > @Mike: ?
> > 
> 
> HAVE_CLK logically wraps HAVE_CLK_PREPARE.  There is no point in
> selecting HAVE_CLK_PREPARE without HAVE_CLK.
> 
> Looking through the code I see that this used to be the case.  Commit
> 93abe8e "clk: add non CONFIG_HAVE_CLK routines" moved the
> clk_(un)prepare declarations outside of #ifdef CONFIG_HAVE_CLK.  That
> commit was authored by you.  Can you elaborate on why that aspect of the
> patch was needed?
> 

BTW, it looks like the only place where we select HAVE_CLK_PREPARE is
IMX platform and it also selects COMMON_CLK so I think HAVE_CLK_PREPARE
can be removed now.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] acpi: Fix logging when no pci_irq is allocated

2012-11-21 Thread Joe Perches

On Wed, 2012-11-21 at 21:50 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 21, 2012 05:46:04 AM Joe Perches wrote:
> > On Wed, 2012-11-21 at 16:43 +0800, Daniel J Blueman wrote:
> > > Previously a new line is implicitly added in the no GSI case:
> > > 
> > > [7.185182] pci 0001:00:12.0: can't derive routing for PCI INT A
> > > [7.191352] pci 0001:00:12.0: PCI INT A: no GSI
> > > [7.195956]  - using ISA IRQ 10
> > > 
> > > The code thus prints a blank line where no legacy IRQ is available:
> > > 
> > > [1.650124] pci :00:14.0: can't derive routing for PCI INT A
> > > [1.650126] pci :00:14.0: PCI INT A: no GSI
> > > [1.650126] 
> > > [1.650180] pci :00:14.0: can't derive routing for PCI INT A
> > > 
> > > Fix this by making the newline explicit and removing the superfluous
> > > one.
> > 
> > This breaks the logging code below it when there is an ISA irq.
> > 
> > The below works, but is a workaround for a defect in the printk
> > subsystem introduced by a logging change that will be fixed in
> > a near future release.
> 
> What exactly do you mean by "near future"?

I mean Jan Schönherr's patches that should fix this are
likely to be picked up one day.

https://lkml.org/lkml/2012/11/13/678


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Device tree node to major/minor?

2012-11-21 Thread Simon Glass

Hi Grant,

On Wed, Nov 21, 2012 at 7:47 AM, Grant Likely  wrote:
> On Tue, 20 Nov 2012 15:48:24 -0800, Simon Glass  wrote:
>> Hi Grant,
>>
>> On Tue, Nov 20, 2012 at 2:32 PM, Grant Likely  
>> wrote:
>> > On Tue, Nov 20, 2012 at 10:23 PM, Simon Glass  wrote:
>> >> Hi,
>> >>
>> >> I hope this is a stupid question with an easy answer, but I cannot find 
>> >> it.
>> >>
>> >> I have a device tree node for an mmc block device and I want to use
>> >> that block device from another driver. I have a phandle which lets me
>> >> get the node of the mmc device, but I am not sure how to convert that
>> >> into a block_device. In order to do so, I think I need a major/minor
>> >> number. Of course the phandle might in fact point to a SCSI driver and
>> >> I want that to work correctly also.
>> >>
>> >> I imagine I might be able to search through the wonders of sysfs in
>> >> user space, but is there a better way?
>> >
>> > Do you /want/ to do it from userspace? What is your use case? Mounting
>> > the rootfs?
>>
>> The use case is storing some raw data on a block device from within a
>> driver in the kernel. It is used to keep track of the verified boot
>> state.
>>
>> >
>> > Regardless, userspace can monitor the uevents when devices are added
>> > (that's what udev does) and watch for the full path of the node you
>> > want in the uevent attribute. Then you can look for the child device
>> > with the block major/minor numbers in it.
>>
>> So is there a way to do this entirely in the kernel ex post? It might
>> need to happen during kernel boot, before user space.
>
> Yes, it is certainly doable within the kernel. First, you'll need to use
> a notifier to get called back whenever a new device is created. Then
> you'll need to look at the dev->of_node(->full_name) to see if it is the
> node you actually want. You might need/want to resolve it from an alias
> or something, but I presume you already have a way to find the
> device_node before seaching for a struct device.

OK thank you. Was hoping to find a simple way to find a block device
from a device tree node (yes I know the right one) but I suppose in
general this is impossible, since nodes may create more than one
device, and each has its own data structures leading to the block
device.

So it seems like a notifier is the best way. Thanks for looking at this Grant.

Regards,
Simon

>
> g.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] acpi: Fix logging when no pci_irq is allocated

2012-11-21 Thread Rafael J. Wysocki

On Wednesday, November 21, 2012 05:46:04 AM Joe Perches wrote:
> On Wed, 2012-11-21 at 16:43 +0800, Daniel J Blueman wrote:
> > Previously a new line is implicitly added in the no GSI case:
> > 
> > [7.185182] pci 0001:00:12.0: can't derive routing for PCI INT A
> > [7.191352] pci 0001:00:12.0: PCI INT A: no GSI
> > [7.195956]  - using ISA IRQ 10
> > 
> > The code thus prints a blank line where no legacy IRQ is available:
> > 
> > [1.650124] pci :00:14.0: can't derive routing for PCI INT A
> > [1.650126] pci :00:14.0: PCI INT A: no GSI
> > [1.650126] 
> > [1.650180] pci :00:14.0: can't derive routing for PCI INT A
> > 
> > Fix this by making the newline explicit and removing the superfluous
> > one.
> 
> This breaks the logging code below it when there is an ISA irq.
> 
> The below works, but is a workaround for a defect in the printk
> subsystem introduced by a logging change that will be fixed in
> a near future release.

What exactly do you mean by "near future"?

Rafael


> Signed-off-by: Joe Perches 
> ---
>  drivers/acpi/pci_irq.c |   10 +-
>  1 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> index f288e00..68a921d 100644
> --- a/drivers/acpi/pci_irq.c
> +++ b/drivers/acpi/pci_irq.c
> @@ -458,19 +458,19 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
>*/
>   if (gsi < 0) {
>   u32 dev_gsi;
> - dev_warn(>dev, "PCI INT %c: no GSI", pin_name(pin));
>   /* Interrupt Line values above 0xF are forbidden */
>   if (dev->irq > 0 && (dev->irq <= 0xF) &&
>   (acpi_isa_irq_to_gsi(dev->irq, _gsi) == 0)) {
> - printk(" - using ISA IRQ %d\n", dev->irq);
> + dev_warn(>dev, "PCI INT %c: no GSI - using ISA IRQ 
> %d\n",
> +  pin_name(pin), dev->irq);
>   acpi_register_gsi(>dev, dev_gsi,
> ACPI_LEVEL_SENSITIVE,
> ACPI_ACTIVE_LOW);
> - return 0;
>   } else {
> - printk("\n");
> - return 0;
> + dev_warn(>dev, "PCI INT %c: no GSI\n",
> +  pin_name(pin));
>   }
> + return 0;
>   }
>  
>   rc = acpi_register_gsi(>dev, gsi, triggering, polarity);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 00/15] NFSd state containerization

2012-11-21 Thread J. Bruce Fields

On Thu, Nov 15, 2012 at 01:34:08PM -0500, Jeff Layton wrote:
> On Wed, 14 Nov 2012 17:00:36 -0500
> "J. Bruce Fields"  wrote:
> 
> > On Wed, Nov 14, 2012 at 06:20:59PM +0300, Stanislav Kinsbursky wrote:
> > > This patch set is my first attempt to containerize NFSv4 state - i.e. 
> > > make it
> > > works in networks namespace context.
> > > I admit, that some of this new code could be partially rewritten during 
> > > future
> > > NFSd containerization.
> > > But the overall idea look more or less correct to me.
> > > So, the main things here are:
> > > 1) making nfs4_client network namespace aware.
> > > 2) Allocating all hashes (except file_hashtbl and reclaim_str_hashtbl) per
> > > network namespace context on NFSd start (not init) and destroying on NFSd
> > > state shutdown.
> > > 3) Allocating of reclaim_str_hashtbl on legacy tracker start and 
> > > destroying on
> > > legacy tracker stop.
> > > 4) Moving of client_lru and close_lru lists to per-net data.
> > > 5) Making lundromat network namespace aware.
> > 
> > These look OK and pass my tests.  Jeff, do the revised recovery bits
> > look OK?
> > 
> > Have you done any testing?
> > 
> > It'd be interesting, for example, to know if there are any pynfs that
> > fail against the server in a non-init network namespace, but pass
> > normally.
> > 
> > --b.
> > 
> 
> I looked over the patches and they look sane to me. I move that they go
> into your -next branch to soak for a bit.

Stanislav, actually, I'm unclear, since you labeled these "RFC": do you
consider these patches ready?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] CLK: uninline clk_prepare() and clk_unprepare()

2012-11-21 Thread Mike Turquette

Quoting Viresh Kumar (2012-11-20 02:13:55)
> On 20 November 2012 14:52, Dmitry Torokhov  wrote:
> > We'll need to invoke clk_unprepare() via a pointer in our devm_*
> > conversion so let's uninline the pair.
> 
> Sorry, but you aren't doing this :(
> This routine is already uninlined as it is in clk.c
> 
> Instead you are just moving clk_prepare(), etc calls within
> #ifdef CONFIG_HAVE_CLK
> #else
> #endif
> 
> I doubt why they have been added under #ifdef CONFIG_HAVE_CLK_PREPARE
> earlier. Can they exist without CONFIG_HAVE_CLK
> 
> @Mike: ?
> 

HAVE_CLK logically wraps HAVE_CLK_PREPARE.  There is no point in
selecting HAVE_CLK_PREPARE without HAVE_CLK.

Looking through the code I see that this used to be the case.  Commit
93abe8e "clk: add non CONFIG_HAVE_CLK routines" moved the
clk_(un)prepare declarations outside of #ifdef CONFIG_HAVE_CLK.  That
commit was authored by you.  Can you elaborate on why that aspect of the
patch was needed?

Thanks,
Mike

> > Signed-off-by: Dmitry Torokhov 
> > ---
> >  drivers/clk/clk.c   |  4 
> >  include/linux/clk.h | 68 
> > +
> >  2 files changed, 36 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> > index 56e4495e..1b642f2 100644
> > --- a/drivers/clk/clk.c
> > +++ b/drivers/clk/clk.c
> > @@ -374,6 +374,7 @@ struct clk *__clk_lookup(const char *name)
> >
> >  void __clk_unprepare(struct clk *clk)
> >  {
> > +#ifdef CONFIG_HAVE_CLK_PREPARE
> 
> clk.c is compiled if COMMON_CLK is selected. And COMMON_CLK has following:
> select HAVE_CLK_PREPARE
> 
> So, these checks you added don't have a meaning.
> 
> > if (!clk)
> > return;
> >
> > @@ -389,6 +390,7 @@ void __clk_unprepare(struct clk *clk)
> > clk->ops->unprepare(clk->hw);
> >
> > __clk_unprepare(clk->parent);
> > +#endif
> >  }
> >
> >  /**
> > @@ -412,6 +414,7 @@ EXPORT_SYMBOL_GPL(clk_unprepare);
> >
> >  int __clk_prepare(struct clk *clk)
> >  {
> > +#ifdef CONFIG_HAVE_CLK_PREPARE
> 
> ditto.
> 
> > int ret = 0;
> >
> > if (!clk)
> > @@ -432,6 +435,7 @@ int __clk_prepare(struct clk *clk)
> > }
> >
> > clk->prepare_count++;
> > +#endif
> >
> > return 0;
> >  }
> > diff --git a/include/linux/clk.h b/include/linux/clk.h
> > index b3ac22d..f8204c3 100644
> > --- a/include/linux/clk.h
> > +++ b/include/linux/clk.h
> > @@ -84,42 +84,6 @@ int clk_notifier_unregister(struct clk *clk, struct 
> > notifier_block *nb);
> >
> >  #endif
> >
> > -/**
> > - * clk_prepare - prepare a clock source
> > - * @clk: clock source
> > - *
> > - * This prepares the clock source for use.
> > - *
> > - * Must not be called from within atomic context.
> > - */
> > -#ifdef CONFIG_HAVE_CLK_PREPARE
> > -int clk_prepare(struct clk *clk);
> > -#else
> > -static inline int clk_prepare(struct clk *clk)
> > -{
> > -   might_sleep();
> > -   return 0;
> > -}
> > -#endif
> > -
> > -/**
> > - * clk_unprepare - undo preparation of a clock source
> > - * @clk: clock source
> > - *
> > - * This undoes a previously prepared clock.  The caller must balance
> > - * the number of prepare and unprepare calls.
> > - *
> > - * Must not be called from within atomic context.
> > - */
> > -#ifdef CONFIG_HAVE_CLK_PREPARE
> > -void clk_unprepare(struct clk *clk);
> > -#else
> > -static inline void clk_unprepare(struct clk *clk)
> > -{
> > -   might_sleep();
> > -}
> > -#endif
> > -
> >  #ifdef CONFIG_HAVE_CLK
> >  /**
> >   * clk_get - lookup and obtain a reference to a clock producer.
> > @@ -159,6 +123,27 @@ struct clk *clk_get(struct device *dev, const char 
> > *id);
> >  struct clk *devm_clk_get(struct device *dev, const char *id);
> >
> >  /**
> > + * clk_prepare - prepare a clock source
> > + * @clk: clock source
> > + *
> > + * This prepares the clock source for use.
> > + *
> > + * Must not be called from within atomic context.
> > + */
> > +int clk_prepare(struct clk *clk);
> > +
> > +/**
> > + * clk_unprepare - undo preparation of a clock source
> > + * @clk: clock source
> > + *
> > + * This undoes a previously prepared clock.  The caller must balance
> > + * the number of prepare and unprepare calls.
> > + *
> > + * Must not be called from within atomic context.
> > + */
> > +void clk_unprepare(struct clk *clk);
> > +
> > +/**
> >   * clk_enable - inform the system when the clock source should be running.
> >   * @clk: clock source
> >   *
> > @@ -292,6 +277,17 @@ static inline void clk_put(struct clk *clk) {}
> >
> >  static inline void devm_clk_put(struct device *dev, struct clk *clk) {}
> >
> > +static inline int clk_prepare(struct clk *clk)
> > +{
> > +   might_sleep();
> > +   return 0;
> > +}
> > +
> > +static inline void clk_unprepare(struct clk *clk)
> > +{
> > +   might_sleep();
> > +}
> > +
> >  static inline int clk_enable(struct clk *clk)
> >  {
> > return 0;
> > --
> > 1.7.11.7
> >
--
To unsubscribe from this list: send

[PATCH 6/6] VSOCK: header and config files.

2012-11-21 Thread George Zhang

VSOCK header files, Makefiles and Kconfig systems for Linux VSocket module.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 include/linux/socket.h  |4 
 net/Kconfig |1 
 net/Makefile|1 
 net/vmw_vsock/Kconfig   |   14 +
 net/vmw_vsock/Makefile  |4 
 net/vmw_vsock/notify_qstate.c   |  625 +++
 net/vmw_vsock/vmci_sockets.h|  517 +
 net/vmw_vsock/vmci_sockets_packet.h |   90 +
 net/vmw_vsock/vsock_common.h|  127 +++
 net/vmw_vsock/vsock_packet.h|  124 +++
 net/vmw_vsock/vsock_version.h   |   28 ++
 11 files changed, 1534 insertions(+), 1 deletions(-)
 create mode 100644 net/vmw_vsock/Kconfig
 create mode 100644 net/vmw_vsock/Makefile
 create mode 100644 net/vmw_vsock/notify_qstate.c
 create mode 100644 net/vmw_vsock/vmci_sockets.h
 create mode 100644 net/vmw_vsock/vmci_sockets_packet.h
 create mode 100644 net/vmw_vsock/vsock_common.h
 create mode 100644 net/vmw_vsock/vsock_packet.h
 create mode 100644 net/vmw_vsock/vsock_version.h

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 25d6322..57bc85e 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -195,7 +195,8 @@ struct ucred {
 #define AF_CAIF37  /* CAIF sockets */
 #define AF_ALG 38  /* Algorithm sockets*/
 #define AF_NFC 39  /* NFC sockets  */
-#define AF_MAX 40  /* For now.. */
+#define AF_VSOCK   40  /* VMCI sockets */
+#define AF_MAX 41  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -238,6 +239,7 @@ struct ucred {
 #define PF_CAIFAF_CAIF
 #define PF_ALG AF_ALG
 #define PF_NFC AF_NFC
+#define PF_VSOCK   AF_VSOCK
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/net/Kconfig b/net/Kconfig
index 245831b..75b8d5e 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -216,6 +216,7 @@ source "net/dcb/Kconfig"
 source "net/dns_resolver/Kconfig"
 source "net/batman-adv/Kconfig"
 source "net/openvswitch/Kconfig"
+source "net/vmw_vsock/Kconfig"
 
 config RPS
boolean
diff --git a/net/Makefile b/net/Makefile
index 4f4ee08..cae59f4 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -70,3 +70,4 @@ obj-$(CONFIG_CEPH_LIB)+= ceph/
 obj-$(CONFIG_BATMAN_ADV)   += batman-adv/
 obj-$(CONFIG_NFC)  += nfc/
 obj-$(CONFIG_OPENVSWITCH)  += openvswitch/
+obj-$(CONFIG_VMWARE_VSOCK) += vmw_vsock/
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
new file mode 100644
index 000..95e2568
--- /dev/null
+++ b/net/vmw_vsock/Kconfig
@@ -0,0 +1,14 @@
+#
+# Vsock protocol
+#
+
+config VMWARE_VSOCK
+   tristate "Virtual Socket protocol"
+   depends on VMWARE_VMCI
+   help
+ Virtual Socket Protocol is a socket protocol similar to TCP/IP
+ allowing comunication between Virtual Machines and VMware
+ hypervisor.
+
+ To compile this driver as a module, choose M here: the module
+ will be called vsock. If unsure, say N.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
new file mode 100644
index 000..4e940fe
--- /dev/null
+++ b/net/vmw_vsock/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_VMWARE_VSOCK) += vmw_vsock.o
+
+vmw_vsock-y += af_vsock.o notify.o notify_qstate.o stats.o util.o \
+   vsock_addr.o
diff --git a/net/vmw_vsock/notify_qstate.c b/net/vmw_vsock/notify_qstate.c
new file mode 100644
index 000..5a2f066
--- /dev/null
+++ b/net/vmw_vsock/notify_qstate.c
@@ -0,0 +1,625 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2009-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * notifyQState.c --
+ *
+ * Linux control notifications based on Queuepair state for the VMCI Stream
+ * Sockets protocol.
+ */
+
+#include 
+
+#include 
+
+#include   /* for NULL */
+#include 
+
+#include "notify.h"
+#include "af_vsock.h"
+
+#define PKT_FIELD(vsk, field_name) ((vsk)->notify.pkt_q_state.field_name)
+
+/*
+ *
+ * vsock_vmci_notify_waiting_write --
+ *
+ * Determines if the conditions have been met to notify a waiting writer.
+ *
+ * Results: true if a notification should be sent, false otherwise.
+ *
+ * Side effects: None.
+ */
+
+static bool

[PATCH 5/6] VSOCK: utility functions.

2012-11-21 Thread George Zhang

VSOCK utility functions for Linux VSocket module.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 net/vmw_vsock/util.c |  620 ++
 net/vmw_vsock/util.h |  314 +
 2 files changed, 934 insertions(+), 0 deletions(-)
 create mode 100644 net/vmw_vsock/util.c
 create mode 100644 net/vmw_vsock/util.h

diff --git a/net/vmw_vsock/util.c b/net/vmw_vsock/util.c
new file mode 100644
index 000..cd86482
--- /dev/null
+++ b/net/vmw_vsock/util.c
@@ -0,0 +1,620 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2007-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * util.c --
+ *
+ * Utility functions for Linux VSocket module.
+ */
+
+#include 
+#include 
+#include 
+#include   /* for NULL */
+#include 
+
+#include "af_vsock.h"
+#include "util.h"
+
+struct list_head vsock_bind_table[VSOCK_HASH_SIZE + 1];
+struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
+
+DEFINE_SPINLOCK(vsock_table_lock);
+
+/*
+ *
+ * vsock_vmci_log_pkt --
+ *
+ * Logs the provided packet.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+void vsock_vmci_log_pkt(char const *function, u32 line,
+   struct vsock_packet *pkt)
+{
+   char buf[256];
+   char *cur = buf;
+   int left = sizeof buf;
+   int written = 0;
+   char *type_strings[] = {
+   [VSOCK_PACKET_TYPE_INVALID] = "INVALID",
+   [VSOCK_PACKET_TYPE_REQUEST] = "REQUEST",
+   [VSOCK_PACKET_TYPE_NEGOTIATE] = "NEGOTIATE",
+   [VSOCK_PACKET_TYPE_OFFER] = "OFFER",
+   [VSOCK_PACKET_TYPE_ATTACH] = "ATTACH",
+   [VSOCK_PACKET_TYPE_WROTE] = "WROTE",
+   [VSOCK_PACKET_TYPE_READ] = "READ",
+   [VSOCK_PACKET_TYPE_RST] = "RST",
+   [VSOCK_PACKET_TYPE_SHUTDOWN] = "SHUTDOWN",
+   [VSOCK_PACKET_TYPE_WAITING_WRITE] = "WAITING_WRITE",
+   [VSOCK_PACKET_TYPE_WAITING_READ] = "WAITING_READ",
+   [VSOCK_PACKET_TYPE_REQUEST2] = "REQUEST2",
+   [VSOCK_PACKET_TYPE_NEGOTIATE2] = "NEGOTIATE2",
+   };
+
+   written = snprintf(cur, left, "PKT: %u:%u -> %u:%u",
+  VMCI_HANDLE_TO_CONTEXT_ID(pkt->dg.src),
+  pkt->src_port,
+  VMCI_HANDLE_TO_CONTEXT_ID(pkt->dg.dst),
+  pkt->dst_port);
+   if (written >= left)
+   goto error;
+
+   left -= written;
+   cur += written;
+
+   switch (pkt->type) {
+   case VSOCK_PACKET_TYPE_REQUEST:
+   case VSOCK_PACKET_TYPE_NEGOTIATE:
+   written = snprintf(cur, left, ", %s, size = %" FMT64 "u",
+  type_strings[pkt->type], pkt->u.size);
+   break;
+
+   case VSOCK_PACKET_TYPE_OFFER:
+   case VSOCK_PACKET_TYPE_ATTACH:
+   written = snprintf(cur, left, ", %s, handle = %u:%u",
+  type_strings[pkt->type],
+  VMCI_HANDLE_TO_CONTEXT_ID(pkt->u.handle),
+  VMCI_HANDLE_TO_RESOURCE_ID(pkt->u.handle));
+   break;
+
+   case VSOCK_PACKET_TYPE_WROTE:
+   case VSOCK_PACKET_TYPE_READ:
+   case VSOCK_PACKET_TYPE_RST:
+   written = snprintf(cur, left, ", %s", type_strings[pkt->type]);
+   break;
+   case VSOCK_PACKET_TYPE_SHUTDOWN: {
+   bool recv;
+   bool send;
+
+   recv = pkt->u.mode & RCV_SHUTDOWN;
+   send = pkt->u.mode & SEND_SHUTDOWN;
+   written = snprintf(cur, left, ", %s, mode = %c%c",
+  type_strings[pkt->type],
+  recv ? 'R' : ' ', send ? 'S' : ' ');
+   }
+   break;
+
+   case VSOCK_PACKET_TYPE_WAITING_WRITE:
+   case VSOCK_PACKET_TYPE_WAITING_READ:
+   written = snprintf(cur, left,
+   ", %s, generation = %" FMT64 "u, offset = %" FMT64 "u",
+   type_strings[pkt->type],
+   pkt->u.wait.generation, pkt->u.wait.offset);
+
+   break;
+
+   case VSOCK_PACKET_TYPE_REQUEST2:
+   case VSOCK_PACKET_TYPE_NEGOTIATE2:
+   written = snprintf(cur, left,
+  ", %s, size = %" FMT64 "u, proto = %u",
+  type_strings[pkt->type], pkt->u.size,
+

[PATCH 4/6] VSOCK: statistics implementation.

2012-11-21 Thread George Zhang

VSOCK stats for VMCI Stream Sockets protocol.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 net/vmw_vsock/stats.c |   37 
 net/vmw_vsock/stats.h |  217 +
 2 files changed, 254 insertions(+), 0 deletions(-)
 create mode 100644 net/vmw_vsock/stats.c
 create mode 100644 net/vmw_vsock/stats.h

diff --git a/net/vmw_vsock/stats.c b/net/vmw_vsock/stats.c
new file mode 100644
index 000..2d172d5
--- /dev/null
+++ b/net/vmw_vsock/stats.c
@@ -0,0 +1,37 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2009-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * stats.c --
+ *
+ * Linux stats for the VMCI Stream Sockets protocol.
+ */
+
+#include 
+
+#include 
+#include   /* for NULL */
+#include 
+
+#include "af_vsock.h"
+#include "stats.h"
+
+#ifdef VSOCK_GATHER_STATISTICS
+u64 vsock_stats_ctl_pkt_count[VSOCK_PACKET_TYPE_MAX];
+u64 vsock_stats_consume_queue_hist[VSOCK_NUM_QUEUE_LEVEL_BUCKETS];
+u64 vsock_stats_produce_queue_hist[VSOCK_NUM_QUEUE_LEVEL_BUCKETS];
+atomic64_t vsock_stats_consume_total;
+atomic64_t vsock_stats_produce_total;
+#endif
diff --git a/net/vmw_vsock/stats.h b/net/vmw_vsock/stats.h
new file mode 100644
index 000..9949b22
--- /dev/null
+++ b/net/vmw_vsock/stats.h
@@ -0,0 +1,217 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2009-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * stats.h --
+ *
+ * Stats functions for Linux vsock module.
+ */
+
+#ifndef __STATS_H__
+#define __STATS_H__
+
+#include 
+
+#include "vsock_common.h"
+#include "vsock_packet.h"
+
+/*
+ * Define VSOCK_GATHER_STATISTICS to turn on statistics gathering. Currently
+ * this consists of 3 types of stats: 1. The number of control datagram
+ * messages sent. 2. The level of queuepair fullness (in 10% buckets) whenever
+ * data is about to be enqueued or dequeued from the queuepair. 3. The total
+ * number of bytes enqueued/dequeued.
+ */
+
+#ifdef VSOCK_GATHER_STATISTICS
+
+#define VSOCK_NUM_QUEUE_LEVEL_BUCKETS 10
+extern u64 vsock_stats_ctl_pkt_count[VSOCK_PACKET_TYPE_MAX];
+extern u64 vsock_stats_consume_queue_hist[VSOCK_NUM_QUEUE_LEVEL_BUCKETS];
+extern u64 vsock_stats_produce_queue_hist[VSOCK_NUM_QUEUE_LEVEL_BUCKETS];
+extern atomic64_t vsock_stats_consume_total;
+extern atomic64_t vsock_stats_produce_total;
+
+#define VSOCK_STATS_STREAM_CONSUME_HIST(vsk)   \
+   vsock_vmci_stats_update_queue_bucket_count((vsk)->qpair,\
+   (vsk)->consume_size,\
+   vmci_qpair_consume_buf_ready((vsk)->qpair), \
+   vsock_stats_consume_queue_hist)
+#define VSOCK_STATS_STREAM_PRODUCE_HIST(vsk)   \
+   vsock_vmci_stats_update_queue_bucket_count((vsk)->qpair,\
+   (vsk)->produce_size,\
+   vmci_qpair_produce_buf_ready((vsk)->qpair), \
+   vsock_stats_produce_queue_hist)
+#define VSOCK_STATS_CTLPKT_LOG(pkt_type)   \
+   do {\
+   ++vsock_stats_ctl_pkt_count[pkt_type];  \
+   } while (0)
+#define VSOCK_STATS_STREAM_CONSUME(bytes)  \
+   atomic64_add(_stats_consume_total, bytes)
+#define VSOCK_STATS_STREAM_PRODUCE(bytes)  \
+   atomic64_add(_stats_produce_total, bytes)
+#define VSOCK_STATS_CTLPKT_DUMP_ALL() vsock_vmci_stats_ctl_pkt_dump_all()
+#define VSOCK_STATS_HIST_DUMP_ALL()   vsock_vmci_stats_hist_dump_all()
+#define VSOCK_STATS_TOTALS_DUMP_ALL() vsock_vmci_stats_totals_dump_all()
+#define VSOCK_STATS_RESET()   vsock_vmci_stats_reset()
+
+/*
+ *
+ * vsock_vmci_stats_update_queue_bucket_count --
+ *
+ * Given a queue, determine how much data is enqueued and add that to the
+ * specified queue level statistic bucket.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static inline

[PATCH 3/6] VSOCK: notification implementation.

2012-11-21 Thread George Zhang

VSOCK control notifications for VMCI Stream Sockets protocol.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 net/vmw_vsock/notify.c |  983 
 net/vmw_vsock/notify.h |  130 ++
 2 files changed, 1113 insertions(+), 0 deletions(-)
 create mode 100644 net/vmw_vsock/notify.c
 create mode 100644 net/vmw_vsock/notify.h

diff --git a/net/vmw_vsock/notify.c b/net/vmw_vsock/notify.c
new file mode 100644
index 000..8504e28
--- /dev/null
+++ b/net/vmw_vsock/notify.c
@@ -0,0 +1,983 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2009-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * notify.c --
+ *
+ * Linux control notifications for the VMCI Stream Sockets protocol.
+ */
+
+#include 
+
+#include 
+#include   /* for NULL */
+#include 
+
+#include "notify.h"
+#include "af_vsock.h"
+
+#define PKT_FIELD(vsk, field_name) ((vsk)->notify.pkt.field_name)
+
+#define VSOCK_MAX_DGRAM_RESENDS   10
+
+/*
+ *
+ * vsock_vmci_notify_waiting_write --
+ *
+ * Determines if the conditions have been met to notify a waiting writer.
+ *
+ * Results: true if a notification should be sent, false otherwise.
+ *
+ * Side effects: None.
+ */
+
+static bool vsock_vmci_notify_waiting_write(struct vsock_vmci_sock *vsk)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+   bool retval;
+   u64 notify_limit;
+
+   if (!PKT_FIELD(vsk, peer_waiting_write))
+   return false;
+
+#ifdef VSOCK_OPTIMIZATION_FLOW_CONTROL
+   /*
+* When the sender blocks, we take that as a sign that the sender is
+* faster than the receiver. To reduce the transmit rate of the sender,
+* we delay the sending of the read notification by decreasing the
+* write_notify_window. The notification is delayed until the number of
+* bytes used in the queue drops below the write_notify_window.
+*/
+
+   if (!PKT_FIELD(vsk, peer_waiting_write_detected)) {
+   PKT_FIELD(vsk, peer_waiting_write_detected) = true;
+   if (PKT_FIELD(vsk, write_notify_window) < PAGE_SIZE) {
+   PKT_FIELD(vsk, write_notify_window) =
+   PKT_FIELD(vsk, write_notify_min_window);
+   } else {
+   PKT_FIELD(vsk, write_notify_window) -= PAGE_SIZE;
+   if (PKT_FIELD(vsk, write_notify_window) <
+   PKT_FIELD(vsk, write_notify_min_window))
+   PKT_FIELD(vsk, write_notify_window) =
+   PKT_FIELD(vsk, write_notify_min_window);
+
+   }
+   }
+   notify_limit = vsk->consume_size - PKT_FIELD(vsk, write_notify_window);
+#else
+   notify_limit = 0;
+#endif
+
+   /*
+* For now we ignore the wait information and just see if the free
+* space exceeds the notify limit.  Note that improving this function
+* to be more intelligent will not require a protocol change and will
+* retain compatibility between endpoints with mixed versions of this
+* function.
+*
+* The notify_limit is used to delay notifications in the case where
+* flow control is enabled. Below the test is expressed in terms of
+* free space in the queue: if free_space > ConsumeSize -
+* write_notify_window then notify An alternate way of expressing this
+* is to rewrite the expression to use the data ready in the receive
+* queue: if write_notify_window > bufferReady then notify as
+* free_space == ConsumeSize - bufferReady.
+*/
+   retval = vmci_qpair_consume_free_space(vsk->qpair) > notify_limit;
+#ifdef VSOCK_OPTIMIZATION_FLOW_CONTROL
+   if (retval) {
+   /*
+* Once we notify the peer, we reset the detected flag so the
+* next wait will again cause a decrease in the window size.
+*/
+
+   PKT_FIELD(vsk, peer_waiting_write_detected) = false;
+   }
+#endif
+   return retval;
+#else
+   return true;
+#endif
+}
+
+/*
+ *
+ * vsock_vmci_notify_waiting_read --
+ *
+ * Determines if the conditions have been met to notify a waiting reader.
+ *
+ * Results: true if a notification should be sent, false otherwise.
+ *
+ * Side effects: None.
+ */
+
+static bool vsock_vmci_notify_waiting_read(struct vsock_vmci_sock *vsk)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+   if

[PATCH 2/6] VSOCK: vsock address implementaion.

2012-11-21 Thread George Zhang

VSOCK linux address code implementation.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 net/vmw_vsock/vsock_addr.c |  246 
 net/vmw_vsock/vsock_addr.h |   40 +++
 2 files changed, 286 insertions(+), 0 deletions(-)
 create mode 100644 net/vmw_vsock/vsock_addr.c
 create mode 100644 net/vmw_vsock/vsock_addr.h

diff --git a/net/vmw_vsock/vsock_addr.c b/net/vmw_vsock/vsock_addr.c
new file mode 100644
index 000..35eeb14
--- /dev/null
+++ b/net/vmw_vsock/vsock_addr.c
@@ -0,0 +1,246 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2007-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * vsockAddr.c --
+ *
+ * VSockets address implementation.
+ */
+
+#include 
+#include 
+#include   /* for NULL */
+#include 
+
+#include "vsock_common.h"
+
+/*
+ *
+ * vsock_addr_init --
+ *
+ * Initialize the given address with the given context id and port. This will
+ * clear the address, set the correct family, and add the given values.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+void vsock_addr_init(struct sockaddr_vm *addr, u32 cid, u32 port)
+{
+   memset(addr, 0, sizeof *addr);
+
+   addr->svm_family = AF_VSOCK;
+   addr->svm_cid = cid;
+   addr->svm_port = port;
+}
+
+/*
+ *
+ * vsock_addr_validate --
+ *
+ * Try to validate the given address.  The address must not be null and must
+ * have the correct address family.  Any reserved fields must be zero.
+ *
+ * Results: 0 on success, EFAULT if the address is null, EAFNOSUPPORT if the
+ * address is of the wrong family, and EINVAL if the reserved fields are not
+ * zero.
+ *
+ * Side effects: None.
+ */
+
+int vsock_addr_validate(const struct sockaddr_vm *addr)
+{
+   if (!addr)
+   return -EFAULT;
+
+   if (addr->svm_family != AF_VSOCK)
+   return -EAFNOSUPPORT;
+
+   if (addr->svm_zero[0] != 0)
+   return -EINVAL;
+
+   return 0;
+}
+
+/*
+ *
+ * vsock_addr_bound --
+ *
+ * Determines whether the provided address is bound.
+ *
+ * Results: TRUE if the address structure is bound, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_bound(const struct sockaddr_vm *addr)
+{
+   return addr->svm_port != VMADDR_PORT_ANY;
+}
+
+/*
+ *
+ * vsock_addr_unbind --
+ *
+ * Unbind the given addresss.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+void vsock_addr_unbind(struct sockaddr_vm *addr)
+{
+   vsock_addr_init(addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
+}
+
+/*
+ *
+ * vsock_addr_equals_addr --
+ *
+ * Determine if the given addresses are equal.
+ *
+ * Results: TRUE if the addresses are equal, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_equals_addr(const struct sockaddr_vm *addr,
+   const struct sockaddr_vm *other)
+{
+   return addr->svm_cid == other->svm_cid &&
+   addr->svm_port == other->svm_port;
+}
+
+/*
+ *
+ * vsock_addr_equals_addr_any --
+ *
+ * Determine if the given addresses are equal. Will accept either an exact
+ * match or one where the rids match and that either the cids match or are set
+ * to VMADDR_CID_ANY.
+ *
+ * Results: TRUE if the addresses are equal, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_equals_addr_any(const struct sockaddr_vm *addr,
+   const struct sockaddr_vm *other)
+{
+   return (addr->svm_cid == VMADDR_CID_ANY ||
+   other->svm_cid == VMADDR_CID_ANY ||
+   addr->svm_cid == other->svm_cid) &&
+  addr->svm_port == other->svm_port;
+}
+
+/*
+ *
+ * vsock_addr_equals_handle_port --
+ *
+ * Determines if the given address matches the given handle and port.
+ *
+ * Results: TRUE if the address matches the handle and port, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_equals_handle_port(const struct sockaddr_vm *addr,
+  struct vmci_handle handle, u32 port)
+{
+   return addr->svm_cid == VMCI_HANDLE_TO_CONTEXT_ID(handle) &&
+   addr->svm_port == port;
+}
+
+/*
+ *
+ * vsock_addr_cast --
+ *
+ * Try to cast the given generic address to a VM address.  The given length
+ * must match that of a VM address and the address must be valid. The
+ * "out_addr" parameter contains the address if successful.
+ *
+ * Results: 0 on success, EFAULT if the length is too small.  See
+ * vsock_addr_validate() for other possible return codes.
+ *
+ * Side

[PATCH 0/6] VSOCK for Linux upstreaming

2012-11-21 Thread George Zhang


* * *
This series of VSOCK linux upstreaming patches include latest udpate from
VMware.

Summary of changes:
- Sparse clean.
- Checkpatch clean with one exception, a "complex macro" in
  which we can't add parentheses.
- Remove all runtime assertions.
- Fix device name, so that existing user clients work.
- Fix VMCI handle lookup.

* * *

In an effort to improve the out-of-the-box experience with Linux
kernels for VMware users, VMware is working on readying the Virtual
Machine Communication Interface (vmw_vmci) and VMCI Sockets (VSOCK)
(vmw_vsock) kernel modules for inclusion in the Linux kernel. The
purpose of this post is to acquire feedback on the vmw_vsock kernel
module. The vmw_vmci kernel module has been presented in an early post.


* * *

VMCI Sockets allows virtual machines to communicate with host kernel
modules and the VMware hypervisors. VMCI Sockets kernel module has
dependency on VMCI kernel module. User level applications both in
a virtual machine and on the host can use vmw_vmci through VMCI
Sockets API which facilitates fast and efficient communication
between guest virtual machines and their host. A socket
address family designed to be compatible with UDP and TCP at the
interface level. Today, VMCI and VMCI Sockets are used by the VMware
shared folders (HGFS) and various VMware Tools components inside the
guest for zero-config, network-less access to VMware host services. In
addition to this, VMware's users are using VMCI Sockets for various
applications, where network access of the virtual machine is
restricted or non-existent. Examples of this are VMs communicating
with device proxies for proprietary hardware running as host
applications and automated testing of applications running within
virtual machines.

The VMware VMCI Sockets are similar to other socket types, like
Berkeley UNIX socket interface. The VMCI sockets module supports
both connection-oriented stream sockets like TCP, and connectionless
datagram sockets like UDP. The VSOCK protocol family is defined as
"AF_VSOCK" and the socket operations split for SOCK_DGRAM and
SOCK_STREAM.

For additional information about the use of VMCI and in particular
VMCI Sockets, please refer to the VMCI Socket Programming Guide
available at https://www.vmware.com/support/developer/vmci-sdk/.



---

George Zhang (6):
  VSOCK: vsock protocol implementation.
  VSOCK: vsock address implementaion.
  VSOCK: notification implementation.
  VSOCK: statistics implementation.
  VSOCK: utility functions.
  VSOCK: header and config files.


 include/linux/socket.h  |4 
 net/Kconfig |1 
 net/Makefile|1 
 net/vmw_vsock/Kconfig   |   14 
 net/vmw_vsock/Makefile  |4 
 net/vmw_vsock/af_vsock.c| 4054 +++
 net/vmw_vsock/af_vsock.h|  180 ++
 net/vmw_vsock/notify.c  |  983 
 net/vmw_vsock/notify.h  |  130 +
 net/vmw_vsock/notify_qstate.c   |  625 +
 net/vmw_vsock/stats.c   |   37 
 net/vmw_vsock/stats.h   |  217 ++
 net/vmw_vsock/util.c|  620 +
 net/vmw_vsock/util.h|  314 +++
 net/vmw_vsock/vmci_sockets.h|  517 
 net/vmw_vsock/vmci_sockets_packet.h |   90 +
 net/vmw_vsock/vsock_addr.c  |  246 ++
 net/vmw_vsock/vsock_addr.h  |   40 
 net/vmw_vsock/vsock_common.h|  127 +
 net/vmw_vsock/vsock_packet.h|  124 +
 net/vmw_vsock/vsock_version.h   |   28 
 21 files changed, 8355 insertions(+), 1 deletions(-)
 create mode 100644 net/vmw_vsock/Kconfig
 create mode 100644 net/vmw_vsock/Makefile
 create mode 100644 net/vmw_vsock/af_vsock.c
 create mode 100644 net/vmw_vsock/af_vsock.h
 create mode 100644 net/vmw_vsock/notify.c
 create mode 100644 net/vmw_vsock/notify.h
 create mode 100644 net/vmw_vsock/notify_qstate.c
 create mode 100644 net/vmw_vsock/stats.c
 create mode 100644 net/vmw_vsock/stats.h
 create mode 100644 net/vmw_vsock/util.c
 create mode 100644 net/vmw_vsock/util.h
 create mode 100644 net/vmw_vsock/vmci_sockets.h
 create mode 100644 net/vmw_vsock/vmci_sockets_packet.h
 create mode 100644 net/vmw_vsock/vsock_addr.c
 create mode 100644 net/vmw_vsock/vsock_addr.h
 create mode 100644 net/vmw_vsock/vsock_common.h
 create mode 100644 net/vmw_vsock/vsock_packet.h
 create mode 100644 net/vmw_vsock/vsock_version.h

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/12] VMCI: Some header and config files.

2012-11-21 Thread George Zhang

VMCI head config patch Adds all the necessary files to enable building of the
VMCI module with the Linux Makefiles and Kconfig systems. Also adds the header
files used for building modules against the driver.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/Kconfig|1 
 drivers/misc/Makefile   |2 
 drivers/misc/vmw_vmci/Kconfig   |   16 +
 drivers/misc/vmw_vmci/Makefile  |4 
 drivers/misc/vmw_vmci/vmci_common_int.h |   32 +
 include/linux/vmw_vmci_api.h|   82 +++
 include/linux/vmw_vmci_defs.h   |  973 +++
 7 files changed, 1110 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/Kconfig
 create mode 100644 drivers/misc/vmw_vmci/Makefile
 create mode 100644 drivers/misc/vmw_vmci/vmci_common_int.h
 create mode 100644 include/linux/vmw_vmci_api.h
 create mode 100644 include/linux/vmw_vmci_defs.h

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 2661f6e..fe38c7a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -517,4 +517,5 @@ source "drivers/misc/lis3lv02d/Kconfig"
 source "drivers/misc/carma/Kconfig"
 source "drivers/misc/altera-stapl/Kconfig"
 source "drivers/misc/mei/Kconfig"
+source "drivers/misc/vmw_vmci/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 456972f..21ed953 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -51,3 +51,5 @@ obj-y += carma/
 obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o
 obj-$(CONFIG_ALTERA_STAPL) +=altera-stapl/
 obj-$(CONFIG_INTEL_MEI)+= mei/
+obj-$(CONFIG_MAX8997_MUIC) += max8997-muic.o
+obj-$(CONFIG_VMWARE_VMCI)  += vmw_vmci/
diff --git a/drivers/misc/vmw_vmci/Kconfig b/drivers/misc/vmw_vmci/Kconfig
new file mode 100644
index 000..55015e7
--- /dev/null
+++ b/drivers/misc/vmw_vmci/Kconfig
@@ -0,0 +1,16 @@
+#
+# VMware VMCI device
+#
+
+config VMWARE_VMCI
+   tristate "VMware VMCI Driver"
+   depends on X86
+   help
+ This is VMware's Virtual Machine Communication Interface.  It enables
+ high-speed communication between host and guest in a virtual
+ environment via the VMCI virtual device.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called vmw_vmci.
diff --git a/drivers/misc/vmw_vmci/Makefile b/drivers/misc/vmw_vmci/Makefile
new file mode 100644
index 000..4da9893
--- /dev/null
+++ b/drivers/misc/vmw_vmci/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci.o
+vmw_vmci-y += vmci_context.o vmci_datagram.o vmci_doorbell.o \
+   vmci_driver.o vmci_event.o vmci_guest.o vmci_handle_array.o \
+   vmci_host.o vmci_queue_pair.o vmci_resource.o vmci_route.o
diff --git a/drivers/misc/vmw_vmci/vmci_common_int.h 
b/drivers/misc/vmw_vmci/vmci_common_int.h
new file mode 100644
index 000..81a268a
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_common_int.h
@@ -0,0 +1,32 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_COMMONINT_H_
+#define _VMCI_COMMONINT_H_
+
+#include 
+
+#define PCI_VENDOR_ID_VMWARE   0x15AD
+#define PCI_DEVICE_ID_VMWARE_VMCI  0x0740
+#define VMCI_DRIVER_VERSION_STRING "1.0.0.0-k"
+#define MODULE_NAME "vmw_vmci"
+
+/* Print magic... whee! */
+#ifdef pr_fmt
+#undef pr_fmt
+#define pr_fmt(fmt) MODULE_NAME ": " fmt
+#endif
+
+#endif /* _VMCI_COMMONINT_H_ */
diff --git a/include/linux/vmw_vmci_api.h b/include/linux/vmw_vmci_api.h
new file mode 100644
index 000..193129d
--- /dev/null
+++ b/include/linux/vmw_vmci_api.h
@@ -0,0 +1,82 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef __VMW_VMCI_API_H__
+#define __VMW_VMCI_API_H__
+
+#include 
+#include 
+
+#undef  VMCI_KERNEL_API_VERSION
+#define VMCI_KERNEL_API_VERSION_1 1
+#define VMCI_KERNEL_API_VERSION_2 2
+#define

[PATCH 11/12] VMCI: host side driver implementation.

2012-11-21 Thread George Zhang

VMCI host side driver code implementation.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/vmw_vmci/vmci_host.c | 1036 +
 1 files changed, 1036 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_host.c

diff --git a/drivers/misc/vmw_vmci/vmci_host.c 
b/drivers/misc/vmw_vmci/vmci_host.c
new file mode 100644
index 000..4639e91
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_host.c
@@ -0,0 +1,1036 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_handle_array.h"
+#include "vmci_common_int.h"
+#include "vmci_queue_pair.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_resource.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+#define VMCI_UTIL_NUM_RESOURCES 1
+
+enum {
+   VMCI_NOTIFY_RESOURCE_QUEUE_PAIR = 0,
+   VMCI_NOTIFY_RESOURCE_DOOR_BELL = 1,
+};
+
+enum {
+   VMCI_NOTIFY_RESOURCE_ACTION_NOTIFY = 0,
+   VMCI_NOTIFY_RESOURCE_ACTION_CREATE = 1,
+   VMCI_NOTIFY_RESOURCE_ACTION_DESTROY = 2,
+};
+
+/*
+ * VMCI driver initialization. This block can also be used to
+ * pass initial group membership etc.
+ */
+struct vmci_init_blk {
+   u32 cid;
+   u32 flags;
+};
+
+/* VMCIqueue_pairAllocInfo_VMToVM */
+struct vmci_qp_alloc_info_vmvm {
+   struct vmci_handle handle;
+   u32 peer;
+   u32 flags;
+   u64 produce_size;
+   u64 consume_size;
+   u64 produce_page_file;/* User VA. */
+   u64 consume_page_file;/* User VA. */
+   u64 produce_page_file_size;  /* Size of the file name array. */
+   u64 consume_page_file_size;  /* Size of the file name array. */
+   s32 result;
+   u32 _pad;
+};
+
+/* VMCISetNotifyInfo: Used to pass notify flag's address to the host driver. */
+struct vmci_set_notify_info {
+   u64 notify_uva;
+   s32 result;
+   u32 _pad;
+};
+
+/*
+ * Per-instance host state
+ */
+struct vmci_host_dev {
+   struct vmci_ctx *context;
+   int user_version;
+   enum vmci_obj_type ct_type;
+   struct mutex lock;  /* Mutex lock for vmci context access */
+};
+
+static struct vmci_ctx *host_context;
+static bool vmci_host_device_initialized;
+static atomic_t vmci_host_active_users = ATOMIC_INIT(0);
+
+/*
+ * Determines whether the VMCI host personality is
+ * available. Since the core functionality of the host driver is
+ * always present, all guests could possibly use the host
+ * personality. However, to minimize the deviation from the
+ * pre-unified driver state of affairs, we only consider the host
+ * device active if there is no active guest device or if there
+ * are VMX'en with active VMCI contexts using the host device.
+ */
+bool vmci_host_code_active(void)
+{
+   return vmci_host_device_initialized &&
+   (!vmci_guest_code_active() ||
+atomic_read(_host_active_users) > 0);
+}
+
+/*
+ * Called on open of /dev/vmci.
+ */
+static int vmci_host_open(struct inode *inode, struct file *filp)
+{
+   struct vmci_host_dev *vmci_host_dev;
+
+   vmci_host_dev = kzalloc(sizeof(struct vmci_host_dev), GFP_KERNEL);
+   if (vmci_host_dev == NULL)
+   return -ENOMEM;
+
+   vmci_host_dev->ct_type = VMCIOBJ_NOT_SET;
+   mutex_init(_host_dev->lock);
+   filp->private_data = vmci_host_dev;
+
+   return 0;
+}
+
+/*
+ * Called on close of /dev/vmci, most often when the process
+ * exits.
+ */
+static int vmci_host_close(struct inode *inode, struct file *filp)
+{
+   struct vmci_host_dev *vmci_host_dev = filp->private_data;
+
+   if (vmci_host_dev->ct_type == VMCIOBJ_CONTEXT) {
+   vmci_ctx_destroy(vmci_host_dev->context);
+   vmci_host_dev->context = NULL;
+
+   /*
+* The number of active contexts is used to track whether any
+* VMX'en are using the host personality. It is incremented when
+* a context is created through the IOCTL_VMCI_INIT_CONTEXT
+* ioctl.
+*/
+   atomic_dec(_host_active_users);
+   }
+   vmci_host_dev->ct_type = VMCIOBJ_NOT_SET;
+
+   kfree(vmci_host_dev);
+   filp->private_data =

[PATCH 10/12] VMCI: guest side driver implementation.

2012-11-21 Thread George Zhang

VMCI guest side driver code implementation.

Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 
Signed-off-by: George Zhang 

---
 drivers/misc/vmw_vmci/vmci_guest.c |  757 
 1 files changed, 757 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_guest.c

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
new file mode 100644
index 000..bcbe8ab
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -0,0 +1,757 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+#define VMCI_UTIL_NUM_RESOURCES 1
+
+static bool vmci_disable_msi;
+module_param_named(disable_msi, vmci_disable_msi, bool, 0);
+MODULE_PARM_DESC(disable_msi, "Disable MSI use in driver - (default=0)");
+
+static bool vmci_disable_msix;
+module_param_named(disable_msix, vmci_disable_msix, bool, 0);
+MODULE_PARM_DESC(disable_msix, "Disable MSI-X use in driver - (default=0)");
+
+static u32 ctx_update_sub_id = VMCI_INVALID_ID;
+static u32 vm_context_id = VMCI_INVALID_ID;
+
+struct vmci_guest_device {
+   struct device *dev; /* PCI device we are attached to */
+   void __iomem *iobase;
+
+   unsigned int irq;
+   unsigned int intr_type;
+   bool exclusive_vectors;
+   struct msix_entry msix_entries[VMCI_MAX_INTRS];
+
+   struct tasklet_struct datagram_tasklet;
+   struct tasklet_struct bm_tasklet;
+
+   void *data_buffer;
+   void *notification_bitmap;
+};
+
+/* vmci_dev singleton device and supporting data*/
+static struct vmci_guest_device *vmci_dev_g;
+static DEFINE_SPINLOCK(vmci_dev_spinlock);
+
+static atomic_t vmci_num_guest_devices = ATOMIC_INIT(0);
+
+bool vmci_guest_code_active(void)
+{
+   return atomic_read(_num_guest_devices) != 0;
+}
+
+u32 vmci_get_vm_context_id(void)
+{
+   if (vm_context_id == VMCI_INVALID_ID) {
+   u32 result;
+   struct vmci_datagram get_cid_msg;
+   get_cid_msg.dst =
+   vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+VMCI_GET_CONTEXT_ID);
+   get_cid_msg.src = VMCI_ANON_SRC_HANDLE;
+   get_cid_msg.payload_size = 0;
+   result = vmci_send_datagram(_cid_msg);
+   if (result >= 0)
+   vm_context_id = result;
+   }
+   return vm_context_id;
+}
+
+/*
+ * VM to hypervisor call mechanism. We use the standard VMware naming
+ * convention since shared code is calling this function as well.
+ */
+int vmci_send_datagram(struct vmci_datagram *dg)
+{
+   unsigned long flags;
+   int result;
+
+   /* Check args. */
+   if (dg == NULL)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   /*
+* Need to acquire spinlock on the device because the datagram
+* data may be spread over multiple pages and the monitor may
+* interleave device user rpc calls from multiple
+* VCPUs. Acquiring the spinlock precludes that
+* possibility. Disabling interrupts to avoid incoming
+* datagrams during a "rep out" and possibly landing up in
+* this function.
+*/
+   spin_lock_irqsave(_dev_spinlock, flags);
+
+   if (vmci_dev_g) {
+   iowrite8_rep(vmci_dev_g->iobase + VMCI_DATA_OUT_ADDR,
+dg, VMCI_DG_SIZE(dg));
+   result = ioread32(vmci_dev_g->iobase + VMCI_RESULT_LOW_ADDR);
+   } else {
+   result = VMCI_ERROR_UNAVAILABLE;
+   }
+
+   spin_unlock_irqrestore(_dev_spinlock, flags);
+
+   return result;
+}
+EXPORT_SYMBOL_GPL(vmci_send_datagram);
+
+/*
+ * Gets called with the new context id if updated or resumed.
+ * Context id.
+ */
+static void vmci_guest_cid_update(u32 sub_id,
+ const struct vmci_event_data *event_data,
+ void *client_data)
+{
+   const struct vmci_event_payld_ctx *ev_payload =
+   vmci_event_data_const_payload(event_data);
+
+   if (sub_id != ctx_update_sub_id) {
+   pr_devel("Invalid subscriber (ID=0x%x).\n", sub_id);
+

[PATCH 09/12] VMCI: routing implementation.

2012-11-21 Thread George Zhang

VMCI routing code is responsible for routing between various hosts/guests as 
well
as routing in nested scenarios.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/vmw_vmci/vmci_route.c |  227 
 drivers/misc/vmw_vmci/vmci_route.h |   30 +
 2 files changed, 257 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_route.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_route.h

diff --git a/drivers/misc/vmw_vmci/vmci_route.c 
b/drivers/misc/vmw_vmci/vmci_route.c
new file mode 100644
index 000..7cca156
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_route.c
@@ -0,0 +1,227 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_route.h"
+
+/*
+ * Make a routing decision for the given source and destination handles.
+ * This will try to determine the route using the handles and the available
+ * devices.  Will set the source context if it is invalid.
+ */
+int vmci_route(struct vmci_handle *src,
+  const struct vmci_handle *dst,
+  bool from_guest,
+  enum vmci_route *route)
+{
+   bool has_host_device = vmci_host_code_active();
+   bool has_guest_device = vmci_guest_code_active();
+
+   *route = VMCI_ROUTE_NONE;
+
+   /*
+* "from_guest" is only ever set to true by
+* IOCTL_VMCI_DATAGRAM_SEND (or by the vmkernel equivalent),
+* which comes from the VMX, so we know it is coming from a
+* guest.
+*
+* To avoid inconsistencies, test these once.  We will test
+* them again when we do the actual send to ensure that we do
+* not touch a non-existent device.
+*/
+
+   /* Must have a valid destination context. */
+   if (VMCI_INVALID_ID == dst->context)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   /* Anywhere to hypervisor. */
+   if (VMCI_HYPERVISOR_CONTEXT_ID == dst->context) {
+
+   /*
+* If this message already came from a guest then we
+* cannot send it to the hypervisor.  It must come
+* from a local client.
+*/
+   if (from_guest)
+   return VMCI_ERROR_DST_UNREACHABLE;
+
+   /*
+* We must be acting as a guest in order to send to
+* the hypervisor.
+*/
+   if (!has_guest_device)
+   return VMCI_ERROR_DEVICE_NOT_FOUND;
+
+   /* And we cannot send if the source is the host context. */
+   if (VMCI_HOST_CONTEXT_ID == src->context)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   /*
+* If the client passed the ANON source handle then
+* respect it (both context and resource are invalid).
+* However, if they passed only an invalid context,
+* then they probably mean ANY, in which case we
+* should set the real context here before passing it
+* down.
+*/
+   if (VMCI_INVALID_ID == src->context &&
+   VMCI_INVALID_ID != src->resource)
+   src->context = vmci_get_context_id();
+
+   /* Send from local client down to the hypervisor. */
+   *route = VMCI_ROUTE_AS_GUEST;
+   return VMCI_SUCCESS;
+   }
+
+   /* Anywhere to local client on host. */
+   if (VMCI_HOST_CONTEXT_ID == dst->context) {
+   /*
+* If it is not from a guest but we are acting as a
+* guest, then we need to send it down to the host.
+* Note that if we are also acting as a host then this
+* will prevent us from sending from local client to
+* local client, but we accept that restriction as a
+* way to remove any ambiguity from the host context.
+*/
+   if (src->context == VMCI_HYPERVISOR_CONTEXT_ID) {
+   /*
+* If the hypervisor is the source, this is
+* host local communication. The hypervisor
+* may send vmci event datagrams to the host
+* itself, but it will never

[PATCH 08/12] VMCI: resource object implementation.

2012-11-21 Thread George Zhang

VMCI resource tracks all used resources within the vmci code.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/vmw_vmci/vmci_resource.c |  232 +
 drivers/misc/vmw_vmci/vmci_resource.h |   59 
 2 files changed, 291 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_resource.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_resource.h

diff --git a/drivers/misc/vmw_vmci/vmci_resource.c 
b/drivers/misc/vmw_vmci/vmci_resource.c
new file mode 100644
index 000..0d3a2bc
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_resource.c
@@ -0,0 +1,232 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_resource.h"
+#include "vmci_driver.h"
+
+
+#define VMCI_RESOURCE_HASH_BITS 7
+#define VMCI_RESOURCE_HASH_BUCKETS  (1 << VMCI_RESOURCE_HASH_BITS)
+
+struct vmci_hash_table {
+   spinlock_t lock;
+   struct hlist_head entries[VMCI_RESOURCE_HASH_BUCKETS];
+};
+
+static struct vmci_hash_table vmci_resource_table = {
+   .lock = __SPIN_LOCK_UNLOCKED(vmci_resource_table.lock),
+};
+
+static unsigned int vmci_resource_hash(struct vmci_handle handle)
+{
+   return hash_32(VMCI_HANDLE_TO_RESOURCE_ID(handle),
+  VMCI_RESOURCE_HASH_BITS);
+}
+
+/*
+ * Gets a resource (if one exists) matching given handle from the hash table.
+ */
+static struct vmci_resource *vmci_resource_lookup(struct vmci_handle handle,
+ enum vmci_resource_type type)
+{
+   struct vmci_resource *r, *resource = NULL;
+   struct hlist_node *node;
+   unsigned int idx = vmci_resource_hash(handle);
+
+   rcu_read_lock();
+   hlist_for_each_entry_rcu(r, node,
+_resource_table.entries[idx], node) {
+   u32 rid = VMCI_HANDLE_TO_RESOURCE_ID(r->handle);
+   u32 cid = VMCI_HANDLE_TO_CONTEXT_ID(r->handle);
+
+   if (r->type == type &&
+   rid == VMCI_HANDLE_TO_RESOURCE_ID(handle) &&
+   (cid == VMCI_HANDLE_TO_CONTEXT_ID(handle) ||
+cid == VMCI_INVALID_ID)) {
+   resource = r;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return resource;
+}
+
+/*
+ * Find an unused resource ID and return it. The first
+ * VMCI_RESERVED_RESOURCE_ID_MAX are reserved so we start from
+ * its value + 1.
+ * Returns VMCI resource id on success, VMCI_INVALID_ID on failure.
+ */
+static u32 vmci_resource_find_id(u32 context_id,
+enum vmci_resource_type resource_type)
+{
+   static u32 resource_id = VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+   u32 old_rid = resource_id;
+   u32 current_rid;
+
+   /*
+* Generate a unique resource ID.  Keep on trying until we wrap around
+* in the RID space.
+*/
+   do {
+   struct vmci_handle handle;
+
+   current_rid = resource_id;
+   resource_id++;
+   if (unlikely(resource_id == VMCI_INVALID_ID)) {
+   /* Skip the reserved rids. */
+   resource_id = VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+   }
+
+   handle = vmci_make_handle(context_id, current_rid);
+   if (!vmci_resource_lookup(handle, resource_type))
+   return current_rid;
+   } while (resource_id != old_rid);
+
+   return VMCI_INVALID_ID;
+}
+
+
+int vmci_resource_add(struct vmci_resource *resource,
+ enum vmci_resource_type resource_type,
+ struct vmci_handle handle)
+
+{
+   unsigned int idx;
+   int result;
+
+   spin_lock(_resource_table.lock);
+
+   if (handle.resource == VMCI_INVALID_ID) {
+   handle.resource = vmci_resource_find_id(handle.context,
+   resource_type);
+   if (handle.resource == VMCI_INVALID_ID) {
+   result = VMCI_ERROR_NO_HANDLE;
+   goto out;
+   }
+   } else if (vmci_resource_lookup(handle, resource_type)) {
+   result = VMCI_ERROR_ALREADY_EXISTS;
+   goto out;
+   }
+
+   resource->handle = handle;
+   resource->type = resource_type;
+

[PATCH 06/12] VMCI: handle array implementation.

2012-11-21 Thread George Zhang

VMCI handle code adds support for dynamic arrays that will grow if they need to.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/vmw_vmci/vmci_handle_array.c |  142 +
 drivers/misc/vmw_vmci/vmci_handle_array.h |   52 +++
 2 files changed, 194 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.h

diff --git a/drivers/misc/vmw_vmci/vmci_handle_array.c 
b/drivers/misc/vmw_vmci/vmci_handle_array.c
new file mode 100644
index 000..9122373
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_handle_array.c
@@ -0,0 +1,142 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include "vmci_handle_array.h"
+
+static size_t handle_arr_calc_size(size_t capacity)
+{
+   return sizeof(struct vmci_handle_arr) +
+   capacity * sizeof(struct vmci_handle);
+}
+
+struct vmci_handle_arr *vmci_handle_arr_create(size_t capacity)
+{
+   struct vmci_handle_arr *array;
+
+   if (capacity == 0)
+   capacity = VMCI_HANDLE_ARRAY_DEFAULT_SIZE;
+
+   array = kmalloc(handle_arr_calc_size(capacity), GFP_ATOMIC);
+   if (!array)
+   return NULL;
+
+   array->capacity = capacity;
+   array->size = 0;
+
+   return array;
+}
+
+void vmci_handle_arr_destroy(struct vmci_handle_arr *array)
+{
+   kfree(array);
+}
+
+void vmci_handle_arr_append_entry(struct vmci_handle_arr **array_ptr,
+ struct vmci_handle handle)
+{
+   struct vmci_handle_arr *array = *array_ptr;
+
+   if (unlikely(array->size >= array->capacity)) {
+   /* reallocate. */
+   struct vmci_handle_arr *new_array;
+   size_t new_capacity = array->capacity * VMCI_ARR_CAP_MULT;
+   size_t new_size = handle_arr_calc_size(new_capacity);
+
+   new_array = krealloc(array, new_size, GFP_ATOMIC);
+   if (!new_array)
+   return;
+
+   new_array->capacity = new_capacity;
+   *array_ptr = array = new_array;
+   }
+
+   array->entries[array->size] = handle;
+   array->size++;
+}
+
+/*
+ * Handle that was removed, VMCI_INVALID_HANDLE if entry not found.
+ */
+struct vmci_handle vmci_handle_arr_remove_entry(struct vmci_handle_arr *array,
+   struct vmci_handle entry_handle)
+{
+   struct vmci_handle handle = VMCI_INVALID_HANDLE;
+   size_t i;
+
+   for (i = 0; i < array->size; i++) {
+   if (VMCI_HANDLE_EQUAL(array->entries[i], entry_handle)) {
+   handle = array->entries[i];
+   array->size--;
+   array->entries[i] = array->entries[array->size];
+   array->entries[array->size] = VMCI_INVALID_HANDLE;
+   break;
+   }
+   }
+
+   return handle;
+}
+
+/*
+ * Handle that was removed, VMCI_INVALID_HANDLE if array was empty.
+ */
+struct vmci_handle vmci_handle_arr_remove_tail(struct vmci_handle_arr *array)
+{
+   struct vmci_handle handle = VMCI_INVALID_HANDLE;
+
+   if (array->size) {
+   array->size--;
+   handle = array->entries[array->size];
+   array->entries[array->size] = VMCI_INVALID_HANDLE;
+   }
+
+   return handle;
+}
+
+/*
+ * Handle at given index, VMCI_INVALID_HANDLE if invalid index.
+ */
+struct vmci_handle
+vmci_handle_arr_get_entry(const struct vmci_handle_arr *array, size_t index)
+{
+   if (unlikely(index >= array->size))
+   return VMCI_INVALID_HANDLE;
+
+   return array->entries[index];
+}
+
+bool vmci_handle_arr_has_entry(const struct vmci_handle_arr *array,
+  struct vmci_handle entry_handle)
+{
+   size_t i;
+
+   for (i = 0; i < array->size; i++)
+   if (VMCI_HANDLE_EQUAL(array->entries[i], entry_handle))
+   return true;
+
+   return false;
+}
+
+/*
+ * NULL if the array is empty. Otherwise, a pointer to the array
+ * of VMCI handles in the handle array.
+ */
+struct vmci_handle *vmci_handle_arr_get_handles(struct vmci_handle_arr *array)
+{
+   if (array->size)
+   return array->entries;
+
+   return NULL;
+}
diff --git a/drivers/misc/vmw_vmci/vmci_handle_array.h

[PATCH 05/12] VMCI: event handling implementation.

2012-11-21 Thread George Zhang

VMCI event code that manages event handlers and handles callbacks when
specific events fire.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/vmw_vmci/vmci_event.c |  224 
 drivers/misc/vmw_vmci/vmci_event.h |   25 
 2 files changed, 249 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_event.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_event.h

diff --git a/drivers/misc/vmw_vmci/vmci_event.c 
b/drivers/misc/vmw_vmci/vmci_event.c
new file mode 100644
index 000..1fe40e5
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_event.c
@@ -0,0 +1,224 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+#define EVENT_MAGIC 0xEABE
+#define VMCI_EVENT_MAX_ATTEMPTS 10
+
+struct vmci_subscription {
+   u32 id;
+   u32 event;
+   vmci_event_cb callback;
+   void *callback_data;
+   struct list_head node;  /* on one of subscriber lists */
+};
+
+static struct list_head subscriber_array[VMCI_EVENT_MAX];
+static DEFINE_MUTEX(subscriber_mutex);
+
+int __init vmci_event_init(void)
+{
+   int i;
+
+   for (i = 0; i < VMCI_EVENT_MAX; i++)
+   INIT_LIST_HEAD(_array[i]);
+
+   return VMCI_SUCCESS;
+}
+
+void vmci_event_exit(void)
+{
+   int e;
+
+   /* We free all memory at exit. */
+   for (e = 0; e < VMCI_EVENT_MAX; e++) {
+   struct vmci_subscription *cur, *p2;
+   list_for_each_entry_safe(cur, p2, _array[e], node) {
+
+   /*
+* We should never get here because all events
+* should have been unregistered before we try
+* to unload the driver module.
+*/
+   pr_warn("Unexpected free events occurring.\n");
+   list_del(>node);
+   kfree(cur);
+   }
+   }
+}
+
+/*
+ * Find entry. Assumes subscriber_mutex is held.
+ */
+static struct vmci_subscription *event_find(u32 sub_id)
+{
+   int e;
+
+   for (e = 0; e < VMCI_EVENT_MAX; e++) {
+   struct vmci_subscription *cur;
+   list_for_each_entry(cur, _array[e], node) {
+   if (cur->id == sub_id)
+   return cur;
+   }
+   }
+   return NULL;
+}
+
+/*
+ * Actually delivers the events to the subscribers.
+ * The callback function for each subscriber is invoked.
+ */
+static void event_deliver(struct vmci_event_msg *event_msg)
+{
+   struct vmci_subscription *cur;
+   struct list_head *subscriber_list;
+
+   rcu_read_lock();
+   subscriber_list = _array[event_msg->event_data.event];
+   list_for_each_entry_rcu(cur, subscriber_list, node) {
+   cur->callback(cur->id, _msg->event_data,
+ cur->callback_data);
+   }
+   rcu_read_unlock();
+}
+
+/*
+ * Dispatcher for the VMCI_EVENT_RECEIVE datagrams. Calls all
+ * subscribers for given event.
+ */
+int vmci_event_dispatch(struct vmci_datagram *msg)
+{
+   struct vmci_event_msg *event_msg = (struct vmci_event_msg *)msg;
+
+   if (msg->payload_size < sizeof(u32) ||
+   msg->payload_size > sizeof(struct vmci_event_data_max))
+   return VMCI_ERROR_INVALID_ARGS;
+
+   if (!VMCI_EVENT_VALID(event_msg->event_data.event))
+   return VMCI_ERROR_EVENT_UNKNOWN;
+
+   event_deliver(event_msg);
+   return VMCI_SUCCESS;
+}
+
+/*
+ * vmci_event_subscribe() - Subscribe to a given event.
+ * @event:  The event to subscribe to.
+ * @callback:   The callback to invoke upon the event.
+ * @callback_data:  Data to pass to the callback.
+ * @subscription_id:ID used to track subscription.  Used with
+ *  vmci_event_unsubscribe()
+ *
+ * Subscribes to the provided event. The callback specified will be
+ * fired from RCU critical section and therefore must not sleep.
+ */
+int vmci_event_subscribe(u32 event,
+vmci_event_cb callback,
+void *callback_data,
+u32 *new_subscription_id)
+{
+   struct vmci_subscription *sub;
+   int attempts;
+   int retval;
+   bool have_new_id = false;
+
+   if (!new_subscription_id) {
+

[PATCH 04/12] VMCI: device driver implementaton.

2012-11-21 Thread George Zhang

VMCI driver code implementes both the host and guest personalities of the VMCI 
driver.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/vmw_vmci/vmci_driver.c |  117 +++
 drivers/misc/vmw_vmci/vmci_driver.h |   50 +++
 2 files changed, 167 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_driver.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_driver.h

diff --git a/drivers/misc/vmw_vmci/vmci_driver.c 
b/drivers/misc/vmw_vmci/vmci_driver.c
new file mode 100644
index 000..c04c24c
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_driver.c
@@ -0,0 +1,117 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+static bool vmci_disable_host;
+module_param_named(disable_host, vmci_disable_host, bool, 0);
+MODULE_PARM_DESC(disable_host,
+"Disable driver host personality (default=enabled)");
+
+static bool vmci_disable_guest;
+module_param_named(disable_guest, vmci_disable_guest, bool, 0);
+MODULE_PARM_DESC(disable_guest,
+"Disable driver guest personality (default=enabled)");
+
+static bool vmci_guest_personality_initialized;
+static bool vmci_host_personality_initialized;
+
+/*
+ * vmci_get_context_id() - Gets the current context ID.
+ *
+ * Returns the current context ID.  Note that since this is accessed only
+ * from code running in the host, this always returns the host context ID.
+ */
+u32 vmci_get_context_id(void)
+{
+   if (vmci_guest_code_active())
+   return vmci_get_vm_context_id();
+   else if (vmci_host_code_active())
+   return VMCI_HOST_CONTEXT_ID;
+
+   return VMCI_INVALID_ID;
+}
+EXPORT_SYMBOL_GPL(vmci_get_context_id);
+
+static int __init vmci_drv_init(void)
+{
+   int vmci_err;
+   int error;
+
+   vmci_err = vmci_event_init();
+   if (vmci_err < VMCI_SUCCESS) {
+   pr_err("Failed to initialize VMCIEvent (result=%d).\n",
+   vmci_err);
+   return -EINVAL;
+   }
+
+   if (!vmci_disable_guest) {
+   error = vmci_guest_init();
+   if (error) {
+   pr_warn("Failed to initialize guest personality 
(err=%d).\n",
+   error);
+   } else {
+   vmci_guest_personality_initialized = true;
+   pr_info("Guest personality initialized and is %s\n",
+   vmci_guest_code_active() ?
+   "active" : "inactive");
+   }
+   }
+
+   if (!vmci_disable_host) {
+   error = vmci_host_init();
+   if (error) {
+   pr_warn("Unable to initialize host personality 
(err=%d).\n",
+   error);
+   } else {
+   vmci_host_personality_initialized = true;
+   pr_info("Initialized host personality\n");
+   }
+   }
+
+   if (!vmci_guest_personality_initialized &&
+   !vmci_host_personality_initialized) {
+   vmci_event_exit();
+   return -ENODEV;
+   }
+
+   return 0;
+}
+module_init(vmci_drv_init);
+
+static void __exit vmci_drv_exit(void)
+{
+   if (vmci_guest_personality_initialized)
+   vmci_guest_exit();
+
+   if (vmci_host_personality_initialized)
+   vmci_host_exit();
+
+   vmci_event_exit();
+}
+module_exit(vmci_drv_exit);
+
+MODULE_AUTHOR("VMware, Inc.");
+MODULE_DESCRIPTION("VMware Virtual Machine Communication Interface.");
+MODULE_VERSION(VMCI_DRIVER_VERSION_STRING);
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/misc/vmw_vmci/vmci_driver.h 
b/drivers/misc/vmw_vmci/vmci_driver.h
new file mode 100644
index 000..f69156a
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_driver.h
@@ -0,0 +1,50 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of

[PATCH 03/12] VMCI: doorbell implementation.

2012-11-21 Thread George Zhang

VMCI doorbell code allows for notifcations between host and guest.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/vmw_vmci/vmci_doorbell.c |  605 +
 drivers/misc/vmw_vmci/vmci_doorbell.h |   51 +++
 2 files changed, 656 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_doorbell.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_doorbell.h

diff --git a/drivers/misc/vmw_vmci/vmci_doorbell.c 
b/drivers/misc/vmw_vmci/vmci_doorbell.c
new file mode 100644
index 000..bced12b
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_doorbell.c
@@ -0,0 +1,605 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_resource.h"
+#include "vmci_driver.h"
+#include "vmci_route.h"
+
+
+#define VMCI_DOORBELL_INDEX_BITS   6
+#define VMCI_DOORBELL_INDEX_TABLE_SIZE (1 << VMCI_DOORBELL_INDEX_BITS)
+#define VMCI_DOORBELL_HASH(_idx)   hash_32(_idx, VMCI_DOORBELL_INDEX_BITS)
+
+/*
+ * DoorbellEntry describes the a doorbell notification handle allocated by the
+ * host.
+ */
+struct dbell_entry {
+   struct vmci_resource resource;
+   struct hlist_node node;
+   struct work_struct work;
+   vmci_callback notify_cb;
+   void *client_data;
+   u32 idx;
+   u32 priv_flags;
+   bool run_delayed;
+   atomic_t active;/* Only used by guest personality */
+};
+
+/* The VMCI index table keeps track of currently registered doorbells. */
+struct dbell_index_table {
+   spinlock_t lock;/* Index table lock */
+   struct hlist_head entries[VMCI_DOORBELL_INDEX_TABLE_SIZE];
+};
+
+static struct dbell_index_table vmci_doorbell_it = {
+   .lock = __SPIN_LOCK_UNLOCKED(vmci_doorbell_it.lock),
+};
+
+/*
+ * The max_notify_idx is one larger than the currently known bitmap index in
+ * use, and is used to determine how much of the bitmap needs to be scanned.
+ */
+static u32 max_notify_idx;
+
+/*
+ * The notify_idx_count is used for determining whether there are free entries
+ * within the bitmap (if notify_idx_count + 1 < max_notify_idx).
+ */
+static u32 notify_idx_count;
+
+/*
+ * The last_notify_idx_reserved is used to track the last index handed out - in
+ * the case where multiple handles share a notification index, we hand out
+ * indexes round robin based on last_notify_idx_reserved.
+ */
+static u32 last_notify_idx_reserved;
+
+/* This is a one entry cache used to by the index allocation. */
+static u32 last_notify_idx_released = PAGE_SIZE;
+
+
+/*
+ * Utility function that retrieves the privilege flags associated
+ * with a given doorbell handle. For guest endpoints, the
+ * privileges are determined by the context ID, but for host
+ * endpoints privileges are associated with the complete
+ * handle. Hypervisor endpoints are not yet supported.
+ */
+int vmci_dbell_get_priv_flags(struct vmci_handle handle, u32 *priv_flags)
+{
+   if (priv_flags == NULL || handle.context == VMCI_INVALID_ID)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   if (handle.context == VMCI_HOST_CONTEXT_ID) {
+   struct dbell_entry *entry;
+   struct vmci_resource *resource;
+
+   resource = vmci_resource_by_handle(handle,
+  VMCI_RESOURCE_TYPE_DOORBELL);
+   if (!resource)
+   return VMCI_ERROR_NOT_FOUND;
+
+   entry = container_of(resource, struct dbell_entry, resource);
+   *priv_flags = entry->priv_flags;
+   vmci_resource_put(resource);
+   } else if (handle.context == VMCI_HYPERVISOR_CONTEXT_ID) {
+   /*
+* Hypervisor endpoints for notifications are not
+* supported (yet).
+*/
+   return VMCI_ERROR_INVALID_ARGS;
+   } else {
+   *priv_flags = vmci_context_get_priv_flags(handle.context);
+   }
+
+   return VMCI_SUCCESS;
+}
+
+/*
+ * Find doorbell entry by bitmap index.
+ */
+static struct dbell_entry *dbell_index_table_find(u32 idx)
+{
+   u32 bucket = VMCI_DOORBELL_HASH(idx);
+   struct dbell_entry *dbell;
+   struct hlist_node *node;
+
+   hlist_for_each_entry(dbell, node, _doorbell_it.entries[bucket],
+

[PATCH 02/12] VMCI: datagram implementation.

2012-11-21 Thread George Zhang

VMCI datagram Implements datagrams to allow data to be sent between host and 
guest.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/vmw_vmci/vmci_datagram.c |  501 +
 drivers/misc/vmw_vmci/vmci_datagram.h |   52 +++
 2 files changed, 553 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_datagram.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_datagram.h

diff --git a/drivers/misc/vmw_vmci/vmci_datagram.c 
b/drivers/misc/vmw_vmci/vmci_datagram.c
new file mode 100644
index 000..a6513f4
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_datagram.c
@@ -0,0 +1,501 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+#include "vmci_resource.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+#include "vmci_route.h"
+
+/*
+ * struct datagram_entry describes the datagram entity. It is used for datagram
+ * entities created only on the host.
+ */
+struct datagram_entry {
+   struct vmci_resource resource;
+   u32 flags;
+   bool run_delayed;
+   vmci_datagram_recv_cb recv_cb;
+   void *client_data;
+   u32 priv_flags;
+};
+
+struct delayed_datagram_info {
+   struct datagram_entry *entry;
+   struct vmci_datagram msg;
+   struct work_struct work;
+   bool in_dg_host_queue;
+};
+
+/* Number of in-flight host->host datagrams */
+static atomic_t delayed_dg_host_queue_size = ATOMIC_INIT(0);
+
+/*
+ * Create a datagram entry given a handle pointer.
+ */
+static int dg_create_handle(u32 resource_id,
+   u32 flags,
+   u32 priv_flags,
+   vmci_datagram_recv_cb recv_cb,
+   void *client_data, struct vmci_handle *out_handle)
+{
+   int result;
+   u32 context_id;
+   struct vmci_handle handle;
+   struct datagram_entry *entry;
+
+   if ((flags & VMCI_FLAG_WELLKNOWN_DG_HND) != 0)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   if ((flags & VMCI_FLAG_ANYCID_DG_HND) != 0) {
+   context_id = VMCI_INVALID_ID;
+   } else {
+   context_id = vmci_get_context_id();
+   if (context_id == VMCI_INVALID_ID)
+   return VMCI_ERROR_NO_RESOURCES;
+   }
+
+   handle = vmci_make_handle(context_id, resource_id);
+
+   entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+   if (!entry) {
+   pr_warn("Failed allocating memory for datagram entry.\n");
+   return VMCI_ERROR_NO_MEM;
+   }
+
+   entry->run_delayed = (flags & VMCI_FLAG_DG_DELAYED_CB) ? true : false;
+   entry->flags = flags;
+   entry->recv_cb = recv_cb;
+   entry->client_data = client_data;
+   entry->priv_flags = priv_flags;
+
+   /* Make datagram resource live. */
+   result = vmci_resource_add(>resource,
+  VMCI_RESOURCE_TYPE_DATAGRAM,
+  handle);
+   if (result != VMCI_SUCCESS) {
+   pr_warn("Failed to add new resource (handle=0x%x:0x%x), error: 
%d\n",
+   handle.context, handle.resource, result);
+   kfree(entry);
+   return result;
+   }
+
+   *out_handle = vmci_resource_handle(>resource);
+   return VMCI_SUCCESS;
+}
+
+/*
+ * Internal utility function with the same purpose as
+ * vmci_datagram_get_priv_flags that also takes a context_id.
+ */
+static int vmci_datagram_get_priv_flags(u32 context_id,
+   struct vmci_handle handle,
+   u32 *priv_flags)
+{
+   if (context_id == VMCI_INVALID_ID)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   if (context_id == VMCI_HOST_CONTEXT_ID) {
+   struct datagram_entry *src_entry;
+   struct vmci_resource *resource;
+
+   resource = vmci_resource_by_handle(handle,
+  VMCI_RESOURCE_TYPE_DATAGRAM);
+   if (!resource)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   src_entry = container_of(resource, struct datagram_entry,
+resource);
+   *priv_flags = src_entry->priv_flags;
+

[PATCH 01/12] VMCI: context implementation.

2012-11-21 Thread George Zhang

VMCI Context code maintains state for vmci and allows the driver to communicate
with multiple VMs.

Signed-off-by: George Zhang 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Andy King 

---
 drivers/misc/vmw_vmci/vmci_context.c | 1223 ++
 drivers/misc/vmw_vmci/vmci_context.h |  183 +
 2 files changed, 1406 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_context.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_context.h

diff --git a/drivers/misc/vmw_vmci/vmci_context.c 
b/drivers/misc/vmw_vmci/vmci_context.c
new file mode 100644
index 000..6f5abb5
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_context.c
@@ -0,0 +1,1223 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_queue_pair.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+/*
+ * List of current VMCI contexts.  Contexts can be added by
+ * vmci_ctx_create() and removed via vmci_ctx_destroy().
+ * These, along with context lookup, are protected by the
+ * list structure's lock.
+ */
+static struct {
+   struct list_head head;
+   spinlock_t lock; /* Spinlock for context list operations */
+} ctx_list = {
+   .head = LIST_HEAD_INIT(ctx_list.head),
+   .lock = __SPIN_LOCK_UNLOCKED(ctx_list.lock),
+};
+
+/* Used by contexts that did not set up notify flag pointers */
+static bool ctx_dummy_notify;
+
+static void ctx_signal_notify(struct vmci_ctx *context)
+{
+   *context->notify = true;
+}
+
+static void ctx_clear_notify(struct vmci_ctx *context)
+{
+   *context->notify = false;
+}
+
+/*
+ * If nothing requires the attention of the guest, clears both
+ * notify flag and call.
+ */
+static void ctx_clear_notify_call(struct vmci_ctx *context)
+{
+   if (context->pending_datagrams == 0 &&
+   vmci_handle_arr_get_size(context->pending_doorbell_array) == 0)
+   ctx_clear_notify(context);
+}
+
+/*
+ * Sets the context's notify flag iff datagrams are pending for this
+ * context.  Called from vmci_setup_notify().
+ */
+void vmci_ctx_check_signal_notify(struct vmci_ctx *context)
+{
+   spin_lock(>lock);
+   if (context->pending_datagrams)
+   ctx_signal_notify(context);
+   spin_unlock(>lock);
+}
+
+/*
+ * Allocates and initializes a VMCI context.
+ */
+struct vmci_ctx *vmci_ctx_create(u32 cid, u32 priv_flags,
+uintptr_t event_hnd,
+int user_version,
+const struct cred *cred)
+{
+   struct vmci_ctx *context;
+   int error;
+
+   if (cid == VMCI_INVALID_ID) {
+   pr_devel("Invalid context ID for VMCI context.\n");
+   error = -EINVAL;
+   goto err_out;
+   }
+
+   if (priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS) {
+   pr_devel("Invalid flag (flags=0x%x) for VMCI context.\n",
+priv_flags);
+   error = -EINVAL;
+   goto err_out;
+   }
+
+   if (user_version == 0) {
+   pr_devel("Invalid suer_version %d\n", user_version);
+   error = -EINVAL;
+   goto err_out;
+   }
+
+   context = kzalloc(sizeof(*context), GFP_KERNEL);
+   if (!context) {
+   pr_warn("Failed to allocate memory for VMCI context.\n");
+   error = -EINVAL;
+   goto err_out;
+   }
+
+   kref_init(>kref);
+   spin_lock_init(>lock);
+   INIT_LIST_HEAD(>list_item);
+   INIT_LIST_HEAD(>datagram_queue);
+   INIT_LIST_HEAD(>notifier_list);
+
+   /* Initialize host-specific VMCI context. */
+   init_waitqueue_head(>host_context.wait_queue);
+
+   context->queue_pair_array = vmci_handle_arr_create(0);
+   if (!context->queue_pair_array) {
+   error = -ENOMEM;
+   goto err_free_ctx;
+   }
+
+   context->doorbell_array = vmci_handle_arr_create(0);
+   if (!context->doorbell_array) {
+   error = -ENOMEM;
+   goto err_free_qp_array;
+   }
+
+   context->pending_doorbell_array = vmci_handle_arr_create(0);
+   if (!context->pending_doorbell_array) {
+   error = -ENOMEM;
+   goto err_free_db_array;
+   }
+
+

[PATCH 00/12] VMCI for Linux upstreaming

2012-11-21 Thread George Zhang


* * *
This series of VMCI linux upstreaming patches include latest udpate from
VMware.

Summary of changes:

- Sparse clean.
- Checkpatch clean with one exception, a "complex macro" in
  which we can't add parentheses.
- Remove all runtime assertions.
- Fix device name, so that existing user clients work.
- Fix VMCI handle lookup.


* * *

In an effort to improve the out-of-the-box experience with Linux
kernels for VMware users, VMware is working on readying the Virtual
Machine Communication Interface (vmw_vmci) and VMCI Sockets
(vmw_vsock) kernel modules for inclusion in the Linux kernel. The
purpose of this post is to acquire feedback on the vmw_vmci kernel
module. The vmw_vsock kernel module will be presented in a later post.


* * *

VMCI allows virtual machines to communicate with host kernel modules
and the VMware hypervisors. User level applications both in a virtual
machine and on the host can use vmw_vmci through VMCI Sockets, a socket
address family designed to be compatible with UDP and TCP at the
interface level. Today, VMCI and VMCI Sockets are used by the VMware
shared folders (HGFS) and various VMware Tools components inside the
guest for zero-config, network-less access to VMware host services. In
addition to this, VMware's users are using VMCI Sockets for various
applications, where network access of the virtual machine is
restricted or non-existent. Examples of this are VMs communicating
with device proxies for proprietary hardware running as host
applications and automated testing of applications running within
virtual machines.

In a virtual machine, VMCI is exposed as a regular PCI device. The
primary communication mechanisms supported are a point-to-point
bidirectional transport based on a pair of memory-mapped queues, and
asynchronous notifications in the form of datagrams and
doorbells. These features are available to kernel level components
such as HGFS and VMCI Sockets through the VMCI kernel API. In addition
to this, the VMCI kernel API provides support for receiving events
related to the state of the VMCI communication channels, and the
virtual machine itself.

Outside the virtual machine, the host side support of the VMCI kernel
module makes the same VMCI kernel API available to VMCI endpoints on
the host. In addition to this, the host side manages each VMCI device
in a virtual machine through a context object. This context object
serves to identify the virtual machine for communication, and to track
the resource consumption of the given VMCI device. Both operations
related to communication between the virtual machine and the host
kernel, and those related to the management of the VMCI device state
in the host kernel, are invoked by the user level component of the
hypervisor through a set of ioctls on the VMCI device node.  To
provide seamless support for nested virtualization, where a virtual
machine may use both a VMCI PCI device to talk to its hypervisor, and
the VMCI host side support to run nested virtual machines, the VMCI
host and virtual machine support are combined in a single kernel
module.

For additional information about the use of VMCI and in particular
VMCI Sockets, please refer to the VMCI Socket Programming Guide
available at https://www.vmware.com/support/developer/vmci-sdk/.



---

George Zhang (12):
  VMCI: context implementation.
  VMCI: datagram implementation.
  VMCI: doorbell implementation.
  VMCI: device driver implementaton.
  VMCI: event handling implementation.
  VMCI: handle array implementation.
  VMCI: queue pairs implementation.
  VMCI: resource object implementation.
  VMCI: routing implementation.
  VMCI: guest side driver implementation.
  VMCI: host side driver implementation.
  VMCI: Some header and config files.


 drivers/misc/Kconfig  |1 
 drivers/misc/Makefile |2 
 drivers/misc/vmw_vmci/Kconfig |   16 
 drivers/misc/vmw_vmci/Makefile|4 
 drivers/misc/vmw_vmci/vmci_common_int.h   |   32 
 drivers/misc/vmw_vmci/vmci_context.c  | 1223 ++
 drivers/misc/vmw_vmci/vmci_context.h  |  183 ++
 drivers/misc/vmw_vmci/vmci_datagram.c |  501 
 drivers/misc/vmw_vmci/vmci_datagram.h |   52 
 drivers/misc/vmw_vmci/vmci_doorbell.c |  605 +
 drivers/misc/vmw_vmci/vmci_doorbell.h |   51 
 drivers/misc/vmw_vmci/vmci_driver.c   |  117 +
 drivers/misc/vmw_vmci/vmci_driver.h   |   50 
 drivers/misc/vmw_vmci/vmci_event.c|  224 ++
 drivers/misc/vmw_vmci/vmci_event.h|   25 
 drivers/misc/vmw_vmci/vmci_guest.c|  757 ++
 drivers/misc/vmw_vmci/vmci_handle_array.c |  142 +
 drivers/misc/vmw_vmci/vmci_handle_array.h |   52 
 drivers/misc/vmw_vmci/vmci_host.c | 1036 +
 drivers/misc/vmw_vmci/vmci_queue_pair.c   | 3439 +
 drivers/misc/vmw_vmci/vmci_queue_pair.h   |  191 ++

Re: [PATCH] Revert "serial: omap: fix software flow control"

2012-11-21 Thread Greg KH

On Wed, Nov 07, 2012 at 10:56:59AM +0100, Andreas Bießmann wrote:
> On 16.10.2012 16:09, Felipe Balbi wrote:
> > This reverts commit 957ee7270d632245b43f6feb0e70d9a5e9ea6cf6
> > (serial: omap: fix software flow control).
> > 
> > As Russell has pointed out, that commit isn't fixing
> > Software Flow Control at all, and it actually makes
> > it even more broken.
> > 
> > It was agreed to revert this commit and use Russell's
> > latest UART patches instead.
> > 
> > Cc: Russell King 
> > Signed-off-by: Felipe Balbi 
> 
> since 957ee7270d632245b43f6feb0e70d9a5e9ea6cf6 made it into stable (at
> least 3.4) I think it would be good decision to also apply this revert
> to stable until a working solution exists.

Now queued up for the stable releases, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:core/locking] futex: Avoid wake_futex for a PI futex_q

2012-11-21 Thread tip-bot for Darren Hart

Commit-ID:  0e8f7a5954be13d0c8dcbca3204a9e962498c46e
Gitweb: http://git.kernel.org/tip/0e8f7a5954be13d0c8dcbca3204a9e962498c46e
Author: Darren Hart 
AuthorDate: Tue, 20 Nov 2012 23:36:45 -0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 21 Nov 2012 21:05:34 +0100

futex: Avoid wake_futex for a PI futex_q

Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed. Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.

While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.

Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected. To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.

This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.

Reported-by: Dave Jones 
Signed-off-by: Darren Hart 
Cc: Peter Zijlstra 
Cc: John Kacur 
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/3b25c8ba053760892871713ff6e81660433f6734.1353483196.git.dvh...@linux.intel.com
Signed-off-by: Thomas Gleixner 
---
 kernel/futex.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 3717e7b..5699b21 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -840,6 +840,11 @@ static void wake_futex(struct futex_q *q)
 {
struct task_struct *p = q->task;
 
+   if (q->pi_state || q->rt_waiter) {
+   WARN(1, "%s: refusing to wake PI futex\n", __FUNCTION__);
+   return;
+   }
+
/*
 * We set q->lock_ptr = NULL _before_ we wake up the task. If
 * a non-futex wake up happens on another CPU then the task
@@ -1075,6 +1080,10 @@ retry_private:
 
plist_for_each_entry_safe(this, next, head, list) {
if (match_futex (>key, )) {
+   if (this->pi_state || this->rt_waiter) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
wake_futex(this);
if (++ret >= nr_wake)
break;
@@ -1087,6 +1096,10 @@ retry_private:
op_ret = 0;
plist_for_each_entry_safe(this, next, head, list) {
if (match_futex (>key, )) {
+   if (this->pi_state || this->rt_waiter) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
wake_futex(this);
if (++op_ret >= nr_wake2)
break;
@@ -1095,6 +1108,7 @@ retry_private:
ret += op_ret;
}
 
+out_unlock:
double_unlock_hb(hb1, hb2);
 out_put_keys:
put_futex_key();
@@ -1384,9 +1398,13 @@ retry_private:
/*
 * FUTEX_WAIT_REQEUE_PI and FUTEX_CMP_REQUEUE_PI should always
 * be paired with each other and no other futex ops.
+*
+* We should never be requeueing a futex_q with a pi_state,
+* which is awaiting a futex_unlock_pi().
 */
if ((requeue_pi && !this->rt_waiter) ||
-   (!requeue_pi && this->rt_waiter)) {
+   (!requeue_pi && this->rt_waiter) ||
+   this->pi_state) {
ret = -EINVAL;
break;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: vmscan: Check for fatal signals iff the process was throttled

2012-11-21 Thread Andrew Morton

On Wed, 21 Nov 2012 15:38:24 +
Mel Gorman  wrote:

> commit 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves
> are low and swap is backed by network storage") introduced a check for
> fatal signals after a process gets throttled for network storage. The
> intention was that if a process was throttled and got killed that it
> should not trigger the OOM killer. As pointed out by Minchan Kim and
> David Rientjes, this check is in the wrong place and too broad. If a
> system is in am OOM situation and a process is exiting, it can loop in
> __alloc_pages_slowpath() and calling direct reclaim in a loop. As the
> fatal signal is pending it returns 1 as if it is making forward progress
> and can effectively deadlock.
> 
> This patch moves the fatal_signal_pending() check after throttling to
> throttle_direct_reclaim() where it belongs. If the process is killed
> while throttled, it will return immediately without direct reclaim
> except now it will have TIF_MEMDIE set and will use the PFMEMALLOC
> reserves.
> 
> Minchan pointed out that it may be better to direct reclaim before returning
> to avoid using the reserves because there may be pages that can easily
> reclaim that would avoid using the reserves. However, we do no such targetted
> reclaim and there is no guarantee that suitable pages are available. As it
> is expected that this throttling happens when swap-over-NFS is used there
> is a possibility that the process will instead swap which may allocate
> network buffers from the PFMEMALLOC reserves. Hence, in the swap-over-nfs
> case where a process can be throtted and be killed it can use the reserves
> to exit or it can potentially use reserves to swap a few pages and then
> exit. This patch takes the option of using the reserves if necessary to
> allow the process exit quickly.
> 
> If this patch passes review it should be considered a -stable candidate
> for 3.6.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2207,9 +2207,12 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
>   * Throttle direct reclaimers if backing storage is backed by the network
>   * and the PFMEMALLOC reserve for the preferred node is getting dangerously
>   * depleted. kswapd will continue to make progress and wake the processes
> - * when the low watermark is reached
> + * when the low watermark is reached.
> + *
> + * Returns true if a fatal signal was delivered during throttling. If this

s/delivered/received/imo

> + * happens, the page allocator should not consider triggering the OOM killer.
>   */
> -static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist 
> *zonelist,
> +static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist 
> *zonelist,
>   nodemask_t *nodemask)
>  {
>   struct zone *zone;
> @@ -2224,13 +2227,20 @@ static void throttle_direct_reclaim(gfp_t gfp_mask, 
> struct zonelist *zonelist,
>* processes to block on log_wait_commit().
>*/
>   if (current->flags & PF_KTHREAD)
> - return;
> + goto out;

hm, well, back in the old days some kernel threads were killable via
signals.  They had to opt-in to it by diddling their signal masks and a
few other things.  Too lazy to check if there are still any such sites.


> + /*
> +  * If a fatal signal is pending, this process should not throttle.
> +  * It should return quickly so it can exit and free its memory
> +  */
> + if (fatal_signal_pending(current))
> + goto out;

theresabug.  It should return "true" here.

>  
>   /* Check if the pfmemalloc reserves are ok */
>   first_zones_zonelist(zonelist, high_zoneidx, NULL, );
>   pgdat = zone->zone_pgdat;
>   if (pfmemalloc_watermark_ok(pgdat))
> - return;
> + goto out;
>  
>   /* Account for the throttling */
>   count_vm_event(PGSCAN_DIRECT_THROTTLE);
> @@ -2246,12 +2256,20 @@ static void throttle_direct_reclaim(gfp_t gfp_mask, 
> struct zonelist *zonelist,
>   if (!(gfp_mask & __GFP_FS)) {
>   wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
>   pfmemalloc_watermark_ok(pgdat), HZ);
> - return;
> +
> + goto check_pending;

And this can be just an "else".

>   }
>  
>   /* Throttle until kswapd wakes the process */
>   wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
>   pfmemalloc_watermark_ok(pgdat));
> +
> +check_pending:
> + if (fatal_signal_pending(current))
> + return true;
> +
> +out:
> + return false;
>  }
>  
>  unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> @@ -2273,13 +2291,12 @@ unsigned long try_to_free_pages(struct zonelist 
> *zonelist, int order,
>   .gfp_mask = sc.gfp_mask,
>   };
>  
> - throttle_direct_reclaim(gfp_mask, zonelist, nodemask);
> -
>   /*
> -  * Do not enter reclaim if fatal signal is pending. 1

Re: [PATCH v3 01/12] x86, boot: move verify_cpu.S after 0x200

2012-11-21 Thread Yinghai Lu

On Wed, Nov 21, 2012 at 11:50 AM, H. Peter Anvin  wrote:
> The comment is just plain wrong.  It assumes you're loading an ELF file,
> whereas in practice that is rarely true.
>
> This does explain why the poor ABI, though.  A jump table at the
> beginning would have been a lot cleaner.

Can you please have patch to update the comments and point to the API there ?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] TPM: Issue TPM_STARTUP at driver load if the TPM has not been started

2012-11-21 Thread Jason Gunthorpe

On Wed, Nov 21, 2012 at 09:17:54PM +0100, Peter H?we wrote:

> Care to change to 
> > +   "A TPM error (%zd) occurred attempting to determine the 
> > timeouts\n",
> 
> Sorry that I didn't spot it earlier.

Right.. Probably like this in my tree because of:

http://permalink.gmane.org/gmane.linux.kernel/1397887

I should really get sparse setup here... Never enough hours.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] TPM: Issue TPM_STARTUP at driver load if the TPM has not been started

2012-11-21 Thread Peter Hüwe

Hi Jason,

Thanks for the updated patch! 
Sorry, I have one really minor remark left:

> + if (rc) {
> + dev_err(chip->dev,
> + "A TPM error (%d) occurred attempting to determine the 
> timeouts\n",

rc is a ssize_t here and when compiling with C=1 I get
drivers/char/tpm/tpm.c:582:4: warning: format '%d' expects argument of type 
'int', but argument 3 has type 'ssize_t' [-Wformat]

Care to change to 
> + "A TPM error (%zd) occurred attempting to determine the 
> timeouts\n",

Sorry that I didn't spot it earlier.


Thanks,
Peter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] arm-soc: Xilinx zynq multiplatform changes for v3.8

2012-11-21 Thread Olof Johansson

Hi,

On Wed, Nov 21, 2012 at 04:51:07PM +0100, Michal Simek wrote:
> Hi Olof and Arnd,
> 
> based on my chat with Olof today I have created new branch
> with 4 patches which move zynq to multiplatform.
> 
> This branch depends on arm-soc devel/debug_ll_init branch because
> we needed Rob's "ARM: implement debug_ll_io_init()"
> (sha1: afaee03511ba8002b26a9c6b1fe7d6baf33eac86)
> patch.
> 
> This branch also depends on zynq/dt branch because of previous major
> zynq changes.
> zynq/cleanup branch is subset of zynq/dt.
> 
> That's why I have merged devel/debug_ll_init branch with zynq/dt and
> add 4 patches
> on the top of it.

Nice work.

This looks quite reasonable, is small and self-contained and doesn't really
affect anyone outside of zynq. So I've pulled into next/multiplatform for 3.8.

As part of this, the next/* branches have been somewhat reordered, since
multiplatform now includes next/dt contents I've moved it down below there. It
shouldn't affect much since no other branch pulls in next/multiplatform
contents.

-Olof

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 09/12] userns: Convert ocfs2 to use kuid and kgid where appropriate

2012-11-21 Thread Joel Becker

On Tue, Nov 20, 2012 at 04:43:37AM -0800, Eric W. Biederman wrote:
> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
> index 260b162..8a40457 100644
> --- a/fs/ocfs2/acl.c
> +++ b/fs/ocfs2/acl.c
> @@ -65,7 +65,20 @@ static struct posix_acl *ocfs2_acl_from_xattr(const void 
> *value, size_t size)
>  
>   acl->a_entries[n].e_tag  = le16_to_cpu(entry->e_tag);
>   acl->a_entries[n].e_perm = le16_to_cpu(entry->e_perm);
> - acl->a_entries[n].e_id   = le32_to_cpu(entry->e_id);
> + switch(acl->a_entries[n].e_tag) {
> + case ACL_USER:
> + acl->a_entries[n].e_uid =
> + make_kuid(_user_ns,
> +   le32_to_cpu(entry->e_id));
> + break;

Stupid question: do you consider disjoint namespaces on multiple
machines to be a problem?  Remember that ocfs2 is a cluster filesystem.
If I have uid 100 on machine A in the default namespace, and then I
mount the filesystem on machine B with uid 100 in a different namespace,
what happens?  I presume that both can access as the same nominal uid,
and configuring this correctly is left as an exercise to the namespace
administrator?

> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4f7795f..f99af1c 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2045,8 +2045,8 @@ static void __ocfs2_stuff_meta_lvb(struct inode *inode)
>   lvb->lvb_version   = OCFS2_LVB_VERSION;
>   lvb->lvb_isize = cpu_to_be64(i_size_read(inode));
>   lvb->lvb_iclusters = cpu_to_be32(oi->ip_clusters);
> - lvb->lvb_iuid  = cpu_to_be32(inode->i_uid);
> - lvb->lvb_igid  = cpu_to_be32(inode->i_gid);
> + lvb->lvb_iuid  = cpu_to_be32(i_uid_read(inode));
> + lvb->lvb_igid  = cpu_to_be32(i_gid_read(inode));

I have the reverse question here.  Are we guaranteed that the
on-disk uid/gid will not change regardless of the namespace?  That is,
if I create a file on machine A in init_user_ns as uid 100, then access
it over on machine B in some other namespace with a user-visible uid of
100, will the wire be passing 100 in both directions?  This absolutely
must be true for the cluster communication to work.

Joel

-- 

Life's Little Instruction Book #80

"Slow dance"

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/27] Latest numa/core release, v16

2012-11-21 Thread Mel Gorman

On Wed, Nov 21, 2012 at 08:37:12PM +0100, Andrea Arcangeli wrote:
> Hi,
> 
> On Wed, Nov 21, 2012 at 10:38:59AM +, Mel Gorman wrote:
> > HACKBENCH PIPES
> >  3.7.0 3.7.0 3.7.0  
> >3.7.0 3.7.0
> >rc6-stats-v4r12   rc6-schednuma-v16r2rc6-autonuma-v28fastr3  
> >  rc6-moron-v4r38rc6-twostage-v4r38
> > Procs 1   0.0320 (  0.00%)  0.0354 (-10.53%)  0.0410 (-28.28%)  
> > 0.0310 (  3.00%)  0.0296 (  7.55%)
> > Procs 4   0.0560 (  0.00%)  0.0699 (-24.87%)  0.0641 (-14.47%)  
> > 0.0556 (  0.79%)  0.0562 ( -0.36%)
> > Procs 8   0.0850 (  0.00%)  0.1084 (-27.51%)  0.1397 (-64.30%)  
> > 0.0833 (  1.96%)  0.0953 (-12.07%)
> > Procs 12  0.1047 (  0.00%)  0.1084 ( -3.54%)  0.1789 (-70.91%)  
> > 0.0990 (  5.44%)  0.1127 ( -7.72%)
> > Procs 16  0.1276 (  0.00%)  0.1323 ( -3.67%)  0.1395 ( -9.34%)  
> > 0.1236 (  3.16%)  0.1240 (  2.83%)
> > Procs 20  0.1405 (  0.00%)  0.1578 (-12.29%)  0.2452 (-74.52%)  
> > 0.1471 ( -4.73%)  0.1454 ( -3.50%)
> > Procs 24  0.1823 (  0.00%)  0.1800 (  1.24%)  0.3030 (-66.22%)  
> > 0.1776 (  2.58%)  0.1574 ( 13.63%)
> > Procs 28  0.2019 (  0.00%)  0.2143 ( -6.13%)  0.3403 (-68.52%)  
> > 0.2000 (  0.94%)  0.1983 (  1.78%)
> > Procs 32  0.2162 (  0.00%)  0.2329 ( -7.71%)  0.6526 (-201.85%) 
> >  0.2235 ( -3.36%)  0.2158 (  0.20%)
> > Procs 36  0.2354 (  0.00%)  0.2577 ( -9.47%)  0.4468 (-89.77%)  
> > 0.2619 (-11.24%)  0.2451 ( -4.11%)
> > Procs 40  0.2600 (  0.00%)  0.2850 ( -9.62%)  0.5247 (-101.79%) 
> >  0.2724 ( -4.77%)  0.2646 ( -1.75%)
> > 
> > The number of procs hackbench is running is too low here for a 48-core
> > machine. It should have been reconfigured but this is better than nothing.
> > 
> > schednuma and autonuma both show large regressions in the performance here.
> > I do not investigate why but as there are a number of scheduler changes
> > it could be anything.
> 
> Strange, last time I tested hackbench it was perfectly ok, I even had
> this test shown in some of the pdf.
> 

It's been rebased to 3.7-rc6 since so there may be an incompatible
scheduler change somewhere.

> Lately (post my last hackbench run) I disabled the affine wakeups
> cross-node and pipes use sd_affine wakeups. That could matter for
> these heavy scheduling tests as it practically disables the _sync in
> wake_up_interruptible_sync_poll used by the pipe code, if the waker
> CPU is in a different node than the wakee prev_cpu. I discussed this
> with Mike and he liked this change IIRC but it's the first thing that
> should be checked at the light of above regression.
> 

Understood. I found in early profiles that the mutex_spin_on_owner logic
was also relevant but did not pin down why. I expected it was contention
on mmap_sem due to the PTE scanner but have not had the chance to
verify.

> > PAGE FAULT TEST
> > 
> > This is a microbenchmark for page faults. The number of clients are badly 
> > ordered
> > which again, I really should fix but anyway.
> > 
> >   3.7.0 3.7.0 
> > 3.7.0 3.7.0 3.7.0
> > rc6-stats-v4r12   
> > rc6-schednuma-v16r2rc6-autonuma-v28fastr3   rc6-moron-v4r38
> > rc6-twostage-v4r38
> > System 1   8.0710 (  0.00%)  8.1085 ( -0.46%)  8.0925 ( 
> > -0.27%)  8.0170 (  0.67%) 37.3075 (-362.24%
> > System 10  9.4975 (  0.00%)  9.5690 ( -0.75%) 12.0055 
> > (-26.41%)  9.5915 ( -0.99%)  9.5835 ( -0.91%)
> > System 11  9.7740 (  0.00%)  9.7915 ( -0.18%) 13.4890 
> > (-38.01%)  9.7275 (  0.48%)  9.6810 (  0.95%)
> 
> No real clue on this one as I should look in what the test does.

It's the PFT test in MMTests and it should run it by default out of the
box. Running it will fetch the relevant source and it'll be in
work/testsdisk/sources

> It
> might be related to THP splits though. I can't imagine anything else
> because there's nothing at all in autonuma that alters the page faults
> (except from arming NUMA hinting faults which should be lighter in
> autonuma than in the other implementation using task work).
> 
> Chances are the faults are tested by touching bytes at different 4k
> offsets in the same 2m naturally aligned virtual range.
> 
> Hugh THP native migration patch will clarify things on the above.
> 

The current sets of tests been run has Hugh's THP native migration patch
on top. There was a trivial conflict but otherwise it applied.

> > also hope that the concepts of autonuma would be reimplemented on top of
> > this foundation so we can do a meaningful comparison between different
> > placement policies.
> 
> I'll try to help with this to see what

Re: [PATCH RFC 10/12] userns: Convert xfs to use kuid/kgid/kprojid where appropriate

2012-11-21 Thread Joel Becker

On Wed, Nov 21, 2012 at 10:55:24AM +1100, Dave Chinner wrote:
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 2778258..3656b88 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -570,11 +570,12 @@ xfs_dinode_from_disk(
> > to->di_version = from ->di_version;
> > to->di_format = from->di_format;
> > to->di_onlink = be16_to_cpu(from->di_onlink);
> > -   to->di_uid = be32_to_cpu(from->di_uid);
> > -   to->di_gid = be32_to_cpu(from->di_gid);
> > +   to->di_uid = make_kuid(_user_ns, be32_to_cpu(from->di_uid));
> > +   to->di_gid = make_kgid(_user_ns, be32_to_cpu(from->di_gid));
> 
> You can't do this, because the incore inode structure is written
> directly to the log. This is effectively an on-disk format change.

Yeah, I don't get this either.  Over in ocfs2, you do the
correct thing, translating at the boundary from ocfs2_dinode to struct
inode.

Joel

-- 

"I always thought the hardest questions were those I could not answer.
 Now I know they are the ones I can never ask."
- Charlie Watkins

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 09/12] userns: Convert ocfs2 to use kuid and kgid where appropriate

2012-11-21 Thread Joel Becker

On Tue, Nov 20, 2012 at 04:43:37AM -0800, Eric W. Biederman wrote:
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1116,7 +1116,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   (unsigned long long)OCFS2_I(inode)->ip_blkno,
>   dentry->d_name.len, dentry->d_name.name,
>   attr->ia_valid, attr->ia_mode,
> - attr->ia_uid, attr->ia_gid);
> + from_kuid(_user_ns, attr->ia_uid),
> + from_kgid(_user_ns, attr->ia_gid));

Dear Eric,
I have a similar question about init_user_ns to Dave.  As far as
I can tell, using init_user_ns here means we'll never get translations
based on the current process namespace.  It just so happens that
include/linux/user_namespace.h doesn't allow new namespaces yet, but I
can't see why we would propagate that knowledge elsewhere.
Is there some magic about when init_user_ns should be used
regardless?

Joel

-- 

 Brain: I shall pollute the water supply with this DNAdefibuliser,
turning everyone into mindless slaves.
 Pinky: What about the people who drink bottled water?
 Brain: Pinky, people who pay 5 dollars for a bottle of water are
already mindless slaves.

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/12] x86, boot: move verify_cpu.S after 0x200

2012-11-21 Thread H. Peter Anvin

On 11/21/2012 11:45 AM, Yinghai Lu wrote:
> On Wed, Nov 21, 2012 at 9:23 AM, H. Peter Anvin  wrote:
>> On 11/20/2012 11:15 PM, Yinghai Lu wrote:
>>>
>>> We are short of space before 0x200 that is entry for startup_64.
>>>
>>> And we can not change startup_64 to other value --- ABI ?
>>
>>
>> Here you are saying "I don't understand how this works."  It is YOUR
>> responsibility to find out and write a definite statement rather than
>> leaving that to the reader, or expect the maintainer to edit this.
> 
> actually, i can not find that out.
> in the code of arch/x86/boot/compressed/head_64.S
> 
> /*
>  * Be careful here startup_64 needs to be at a predictable
>  * address so I can export it in an ELF header.  Bootloaders
>  * should look at the ELF header to find this address, as
>  * it may change in the future.
>  */
> .code64
> .org 0x200
> ENTRY(startup_64)
> /*
>  * We come here either from startup_32 or directly from a
>  * 64bit bootloader.  If we come here from a bootloader we depend on
>  * an identity mapped page table being provied that maps our
>  * entire text+data+bss and hopefully all of memory.
>  */
> #ifdef CONFIG_EFI_STUB
> /*
>  * The entry point for the PE/COFF executable is 0x210, so only
>  * legacy boot loaders will execute this jmp.
>  */
> jmp preferred_addr
> 
> .org 0x210
> mov %rcx, %rdi
> 
> and it says that 0x200 will be changed later..
> 
> so you said it has to stay with 0x200, do you mean 0x210 from PE/COFF
> force that?
> 
> wonder if you are considering attatched patch to move startup_64 down...
> we could kill one jmp.
> 

The comment is just plain wrong.  It assumes you're loading an ELF file,
whereas in practice that is rarely true.

This does explain why the poor ABI, though.  A jump table at the
beginning would have been a lot cleaner.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 36/46] mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships

2012-11-21 Thread Rik van Riel


On 11/21/2012 02:15 PM, Mel Gorman wrote:

On Wed, Nov 21, 2012 at 07:25:37PM +0100, Ingo Molnar wrote:



As mentioned in my other mail, this patch of yours looks very
similar to the numa/core commit attached below, mostly written
by Peter:

   30f93abc6cb3 sched, numa, mm: Add the scanning page fault machinery



Just to compare, this is the wording in "autonuma: memory follows CPU
algorithm and task/mm_autonuma stats collection"

+/*
+ * In this function we build a temporal CPU_node<->page relation by
+ * using a two-stage autonuma_last_nid filter to remove short/unlikely
+ * relations.


Looks like the comment came from sched/numa, but the original code
came from autonuma:

https://lkml.org/lkml/2012/8/22/629

If you want to do a real historical dig, we may still have a picture
of the whiteboard where Karen and I came up with the idea of only
migrating a page after the second touch from the same node :)

That was trying to solve the "how can we make migrate on fault as
cheap as possible?" question, and reviewing some earlier autonuma
codebase.

Not that any of this matters in the least.  AutoNUMA, sched/numa,
and balancenuma have all evolved a lot because they were able to
copy good ideas from each other, and discard overly complex or
simply bad ideas (eg. the NUMA syscalls or async page migration),
while replacing them with simpler, better ideas from the other
code bases.

Now that we (mostly) agree on what the basic infrastructure should
look like, we can figure out which placement policies work best for
various workloads.

Then we can make a choice depending on what works best, independent
of who wrote what.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/12] x86, boot: move verify_cpu.S after 0x200

2012-11-21 Thread Yinghai Lu

On Wed, Nov 21, 2012 at 9:23 AM, H. Peter Anvin  wrote:
> On 11/20/2012 11:15 PM, Yinghai Lu wrote:
>>
>> We are short of space before 0x200 that is entry for startup_64.
>>
>> And we can not change startup_64 to other value --- ABI ?
>
>
> Here you are saying "I don't understand how this works."  It is YOUR
> responsibility to find out and write a definite statement rather than
> leaving that to the reader, or expect the maintainer to edit this.

actually, i can not find that out.
in the code of arch/x86/boot/compressed/head_64.S

/*
 * Be careful here startup_64 needs to be at a predictable
 * address so I can export it in an ELF header.  Bootloaders
 * should look at the ELF header to find this address, as
 * it may change in the future.
 */
.code64
.org 0x200
ENTRY(startup_64)
/*
 * We come here either from startup_32 or directly from a
 * 64bit bootloader.  If we come here from a bootloader we depend on
 * an identity mapped page table being provied that maps our
 * entire text+data+bss and hopefully all of memory.
 */
#ifdef CONFIG_EFI_STUB
/*
 * The entry point for the PE/COFF executable is 0x210, so only
 * legacy boot loaders will execute this jmp.
 */
jmp preferred_addr

.org 0x210
mov %rcx, %rdi

and it says that 0x200 will be changed later..

so you said it has to stay with 0x200, do you mean 0x210 from PE/COFF
force that?

wonder if you are considering attatched patch to move startup_64 down...
we could kill one jmp.

Thanks

Yinghai

new_startup_64_0x400.patch
Description: Binary data

Re: [PATCH] perf: call perf_event_comm under task_lock to fix suspicious rcu usage

2012-11-21 Thread Hannes Frederic Sowa

Ping, this problem still persists in v3.7-rc6. Could someone have a look?

On Sat, Nov 10, 2012 at 06:32:28AM +0100, Hannes Frederic Sowa wrote:
> Following RCU warning showed up while executing a shebang-script under
> perf-record (could even be an empty script) on a 3.7-rc4 stable kernel:
> 
>   [   32.185108] 
>   [   32.185332] ===
>   [   32.185602] [ INFO: suspicious RCU usage. ]
>   [   32.185903] 3.7.0-rc4 #1 Not tainted
>   [   32.186021] ---
>   [   32.186021] include/linux/cgroup.h:566 suspicious 
> rcu_dereference_check() usage!
>   [   32.186021] 
>   [   32.186021] other info that might help us debug this:
>   [   32.186021] 
>   [   32.186021] 
>   [   32.186021] rcu_scheduler_active = 1, debug_locks = 0
>   [   32.186021] 1 lock held by empty.sh/556:
>   [   32.186021]  #0:  (>cred_guard_mutex){+.+.+.}, at: 
> [] prepare_bprm_creds+0x36/0x80
>   [   32.186021] 
>   [   32.186021] stack backtrace:
>   [   32.186021] Pid: 556, comm: empty.sh Not tainted 3.7.0-rc4 #1
>   [   32.186021] Call Trace:
>   [   32.186021]  [] lockdep_rcu_suspicious+0xfd/0x130
>   [   32.186021]  [] perf_event_comm+0x436/0x610
>   [   32.186021]  [] ? trace_hardirqs_off+0xd/0x10
>   [   32.186021]  [] ? local_clock+0x6f/0x80
>   [   32.186021]  [] ? 
> lock_release_holdtime.part.26+0xf/0x180
>   [   32.186021]  [] set_task_comm+0x73/0x180
>   [   32.186021]  [] setup_new_exec+0x9a/0x210
>   [   32.186021]  [] load_elf_binary+0x3e3/0x1ab0
>   [   32.186021]  [] ? sched_clock_local+0x25/0xa0
>   [   32.186021]  [] ? sched_clock_cpu+0xa8/0x120
>   [   32.186021]  [] ? trace_hardirqs_off+0xd/0x10
>   [   32.186021]  [] ? local_clock+0x6f/0x80
>   [   32.186021]  [] ? load_elf_library+0x240/0x240
>   [   32.186021]  [] ? load_elf_library+0x240/0x240
>   [   32.186021]  [] search_binary_handler+0x194/0x4f0
>   [   32.186021]  [] ? search_binary_handler+0x5f/0x4f0
>   [   32.186021]  [] ? compat_sys_ioctl+0x1510/0x1510
>   [   32.186021]  [] load_script+0x294/0x2c0
>   [   32.186021]  [] ? 
> lock_release_holdtime.part.26+0xf/0x180
>   [   32.186021]  [] ? compat_sys_ioctl+0x1510/0x1510
>   [   32.186021]  [] search_binary_handler+0x194/0x4f0
>   [   32.186021]  [] ? search_binary_handler+0x5f/0x4f0
>   [   32.186021]  [] do_execve_common.isra.25+0x50b/0x5b0
>   [   32.186021]  [] ? do_execve_common.isra.25+0x12a/0x5b0
>   [   32.186021]  [] do_execve+0x1b/0x20
>   [   32.186021]  [] sys_execve+0x54/0x80
>   [   32.186021]  [] stub_execve+0x69/0xc0
> 
> I think this dereference qualifies for the task_lock exception (as noted
> in kernel/cgroup.c), thus this patch ensures calling perf_event_comm
> before giving up the task_lock.
> 
> Changelog -v2 (since <20121103235758.gd18...@order.stressinduktion.org>):
>   1) rebased to 3.7-rc4
>   2) slightly improved/updated commit msg and added more people to Cc
> 
> Cc: Peter Zijlstra 
> Cc: Paul Mackerras 
> Cc: Ingo Molnar 
> Cc: Arnaldo Carvalho de Melo 
> Signed-off-by: Hannes Frederic Sowa 
> ---
>  fs/exec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 0039055..a961b9d 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1038,8 +1038,8 @@ void set_task_comm(struct task_struct *tsk, char *buf)
>   memset(tsk->comm, 0, TASK_COMM_LEN);
>   wmb();
>   strlcpy(tsk->comm, buf, sizeof(tsk->comm));
> - task_unlock(tsk);
>   perf_event_comm(tsk);
> + task_unlock(tsk);
>  }
>  
>  static void filename_to_taskname(char *tcomm, const char *fn, unsigned int 
> len)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, smpboot: allow manual hotplug of CPUs

2012-11-21 Thread H. Peter Anvin

On 11/21/2012 11:35 AM, Sasha Levin wrote:
> On 11/21/2012 02:24 PM, H. Peter Anvin wrote:
>> On 11/21/2012 11:19 AM, Sasha Levin wrote:

 So, are there any mptables platforms which support hotplug?  If the
 answer is "KVM" then the answer is that KVM needs to move to ACPI to get
 the proper functionality; putting a hack in is really not okay.
>>>
>>> There are no platforms which support actual hotplug, but you can still set
>>> existing processors as disabled in the table and without this patch there's
>>> no way enable them.
>>>
>>> I'm not sure if it's a "hack" though - the presentation of hotpluggable cpus
>>> is the almost the same between mptable and acpi, and acpi provides a way to
>>> manually probe/release cpus as well. The only difference is that acpi also
>>> provides notifications about such events.
>>>
>>> Actually, maybe acpi should start using probe/release as well... hmm...
>>>
>>
>> The bottom line is that I don't want the underlying implementation to
>> end up with a user-visible difference... therein lies madness and lots
>> of bugs.
> 
> Okay, so if in the case of ACPI, 'probe' will call acpi_processor_add()
> and 'release' would call acpi_processor_remove() so the behaviour
> would be the same for both ACPI and mptables. Is this okay?
> 

Sounds reasonable to me.  Adding Len to the Cc: list.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 3/3] man-pages: Add man page for vmpressure_fd(2)

2012-11-21 Thread Andrew Morton

On Wed, 21 Nov 2012 15:01:50 +
Mel Gorman  wrote:

> On Tue, Nov 20, 2012 at 10:12:28AM -0800, David Rientjes wrote:
> > On Mon, 19 Nov 2012, Anton Vorontsov wrote:
> > 
> > > We try to make userland freeing resources when the system becomes low on
> > > memory. Once we're short on memory, sometimes it's better to discard
> > > (free) data, rather than let the kernel to drain file caches or even start
> > > swapping.
> > > 
> > 
> > To add another usecase: its possible to modify our version of malloc (or 
> > any malloc) so that memory that is free()'d can be released back to the 
> > kernel only when necessary, i.e. when keeping the extra memory around 
> > starts to have a detremental effect on the system, memcg, or cpuset.  When 
> > there is an abundance of memory available such that allocations need not 
> > defragment or reclaim memory to be allocated, it can improve performance 
> > to keep a memory arena from which to allocate from immediately without 
> > calling the kernel.
> > 
> 
> A potential third use case is a variation of the first for batch systems. If
> it's running low priority tasks and a high priority task starts that
> results in memory pressure then the job scheduler may decide to move the
> low priority jobs elsewhere (or cancel them entirely).
> 
> A similar use case is monitoring systems running high priority workloads
> that should never swap. It can be easily detected if the system starts
> swapping but a pressure notification might act as an early warning system
> that something is happening on the system that might cause the primary
> workload to start swapping.

I hope Anton's writing all of this down ;)


The proposed API bugs me a bit.  It seems simplistic.  I need to have a
quality think about this.  Maybe the result of that think will be to
suggest an interface which can be extended in a back-compatible fashion
later on, if/when the simplistic nature becomes a problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 36/46] mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships

2012-11-21 Thread Mel Gorman

On Wed, Nov 21, 2012 at 07:15:47PM +, Mel Gorman wrote:
> I've added a note now to that effect now. For all the patches with notes
> or any other ones, I'll be very happy to add the Signed-offs back on if
> the original authors acknowledge they are ok with the end result. If you
> recall, in the original V1 of this series I said;
> 
>   This series steals very heavily from both autonuma and schednuma
>   with very little original code. In some cases I removed the
>   signed-off-bys because the result was too different. I have noted
>   in the changelog where this happened but the signed-offs can be
>   restored if the original authors agree.
> 
> Just to compare, this is the wording in "autonuma: memory follows CPU
> algorithm and task/mm_autonuma stats collection"
> 
> +/*
> + * In this function we build a temporal CPU_node<->page relation by
> + * using a two-stage autonuma_last_nid filter to remove short/unlikely
> + * relations.
> + *
> + * Using P(p) ~ n_p / n_t as per frequentest probability, we can
> + * equate a node's CPU usage of a particular page (n_p) per total
> + * usage of this page (n_t) (in a given time-span) to a probability.
> + *
> + * Our periodic faults will then sample this probability and getting
> + * the same result twice in a row, given these samples are fully
> + * independent, is then given by P(n)^2, provided our sample period
> + * is sufficiently short compared to the usage pattern.
> + *
> + * This quadric squishes small probabilities, making it less likely
> + * we act on an unlikely CPU_node<->page relation.
> + */
> 
> If this was the basis for the sched/numa patch then I'd point out that
> I'm not the only person that failed to preserve history perfectly.
> 

Which to be clear, it isn't. The original source is sched/numa according
to https://lkml.org/lkml/2012/8/22/629 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/27] Latest numa/core release, v16

2012-11-21 Thread Andrea Arcangeli

Hi,

On Wed, Nov 21, 2012 at 10:38:59AM +, Mel Gorman wrote:
> HACKBENCH PIPES
>  3.7.0 3.7.0 3.7.0
>  3.7.0 3.7.0
>rc6-stats-v4r12   rc6-schednuma-v16r2rc6-autonuma-v28fastr3
>rc6-moron-v4r38rc6-twostage-v4r38
> Procs 1   0.0320 (  0.00%)  0.0354 (-10.53%)  0.0410 (-28.28%)
>   0.0310 (  3.00%)  0.0296 (  7.55%)
> Procs 4   0.0560 (  0.00%)  0.0699 (-24.87%)  0.0641 (-14.47%)
>   0.0556 (  0.79%)  0.0562 ( -0.36%)
> Procs 8   0.0850 (  0.00%)  0.1084 (-27.51%)  0.1397 (-64.30%)
>   0.0833 (  1.96%)  0.0953 (-12.07%)
> Procs 12  0.1047 (  0.00%)  0.1084 ( -3.54%)  0.1789 (-70.91%)
>   0.0990 (  5.44%)  0.1127 ( -7.72%)
> Procs 16  0.1276 (  0.00%)  0.1323 ( -3.67%)  0.1395 ( -9.34%)
>   0.1236 (  3.16%)  0.1240 (  2.83%)
> Procs 20  0.1405 (  0.00%)  0.1578 (-12.29%)  0.2452 (-74.52%)
>   0.1471 ( -4.73%)  0.1454 ( -3.50%)
> Procs 24  0.1823 (  0.00%)  0.1800 (  1.24%)  0.3030 (-66.22%)
>   0.1776 (  2.58%)  0.1574 ( 13.63%)
> Procs 28  0.2019 (  0.00%)  0.2143 ( -6.13%)  0.3403 (-68.52%)
>   0.2000 (  0.94%)  0.1983 (  1.78%)
> Procs 32  0.2162 (  0.00%)  0.2329 ( -7.71%)  0.6526 (-201.85%)   
>0.2235 ( -3.36%)  0.2158 (  0.20%)
> Procs 36  0.2354 (  0.00%)  0.2577 ( -9.47%)  0.4468 (-89.77%)
>   0.2619 (-11.24%)  0.2451 ( -4.11%)
> Procs 40  0.2600 (  0.00%)  0.2850 ( -9.62%)  0.5247 (-101.79%)   
>0.2724 ( -4.77%)  0.2646 ( -1.75%)
> 
> The number of procs hackbench is running is too low here for a 48-core
> machine. It should have been reconfigured but this is better than nothing.
> 
> schednuma and autonuma both show large regressions in the performance here.
> I do not investigate why but as there are a number of scheduler changes
> it could be anything.

Strange, last time I tested hackbench it was perfectly ok, I even had
this test shown in some of the pdf.

Lately (post my last hackbench run) I disabled the affine wakeups
cross-node and pipes use sd_affine wakeups. That could matter for
these heavy scheduling tests as it practically disables the _sync in
wake_up_interruptible_sync_poll used by the pipe code, if the waker
CPU is in a different node than the wakee prev_cpu. I discussed this
with Mike and he liked this change IIRC but it's the first thing that
should be checked at the light of above regression.

> PAGE FAULT TEST
> 
> This is a microbenchmark for page faults. The number of clients are badly 
> ordered
> which again, I really should fix but anyway.
> 
>   3.7.0 3.7.0 
> 3.7.0 3.7.0 3.7.0
> rc6-stats-v4r12   
> rc6-schednuma-v16r2rc6-autonuma-v28fastr3   rc6-moron-v4r38
> rc6-twostage-v4r38
> System 1   8.0710 (  0.00%)  8.1085 ( -0.46%)  8.0925 ( 
> -0.27%)  8.0170 (  0.67%) 37.3075 (-362.24%
> System 10  9.4975 (  0.00%)  9.5690 ( -0.75%) 12.0055 
> (-26.41%)  9.5915 ( -0.99%)  9.5835 ( -0.91%)
> System 11  9.7740 (  0.00%)  9.7915 ( -0.18%) 13.4890 
> (-38.01%)  9.7275 (  0.48%)  9.6810 (  0.95%)

No real clue on this one as I should look in what the test does. It
might be related to THP splits though. I can't imagine anything else
because there's nothing at all in autonuma that alters the page faults
(except from arming NUMA hinting faults which should be lighter in
autonuma than in the other implementation using task work).

Chances are the faults are tested by touching bytes at different 4k
offsets in the same 2m naturally aligned virtual range.

Hugh THP native migration patch will clarify things on the above.

> also hope that the concepts of autonuma would be reimplemented on top of
> this foundation so we can do a meaningful comparison between different
> placement policies.

I'll try to help with this to see what could be added from autonuma on
top to improve on top your balancenuma foundation. Your current
foundation looks ideal for inclusion to me.

I noticed you haven't run any single instance specjbb workload, that
should be added to the battery of tests. But hey take your time, the
amount of data you provided is already very comprehensive and you were
so fast.

The thing is: single instance and multi instance are totally different
beasts.

multi instance is all about avoiding NUMA false sharing in the first
place (the anti false sharing algorithm becomes a noop), and it has a
trivial perfect solution with all cross node traffic guaranteed to
stop after converence has been reached for the whole duration of the
workload.

single instance is all about NUMA false sharing detection and it has
no perfect solution and there's no way to fully converge and

Re: [PATCH] x86, smpboot: allow manual hotplug of CPUs

2012-11-21 Thread Sasha Levin

On 11/21/2012 02:24 PM, H. Peter Anvin wrote:
> On 11/21/2012 11:19 AM, Sasha Levin wrote:
>>>
>>> So, are there any mptables platforms which support hotplug?  If the
>>> answer is "KVM" then the answer is that KVM needs to move to ACPI to get
>>> the proper functionality; putting a hack in is really not okay.
>>
>> There are no platforms which support actual hotplug, but you can still set
>> existing processors as disabled in the table and without this patch there's
>> no way enable them.
>>
>> I'm not sure if it's a "hack" though - the presentation of hotpluggable cpus
>> is the almost the same between mptable and acpi, and acpi provides a way to
>> manually probe/release cpus as well. The only difference is that acpi also
>> provides notifications about such events.
>>
>> Actually, maybe acpi should start using probe/release as well... hmm...
>>
> 
> The bottom line is that I don't want the underlying implementation to
> end up with a user-visible difference... therein lies madness and lots
> of bugs.

Okay, so if in the case of ACPI, 'probe' will call acpi_processor_add()
and 'release' would call acpi_processor_remove() so the behaviour
would be the same for both ACPI and mptables. Is this okay?


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFT PATCH v1 4/5] mm: provide more accurate estimation of pages occupied by memmap

2012-11-21 Thread Andrew Morton

On Wed, 21 Nov 2012 22:52:29 +0800
Jiang Liu  wrote:

> On 11/21/2012 03:19 AM, Andrew Morton wrote:
> > On Tue, 20 Nov 2012 23:18:34 +0800
> > Jiang Liu  wrote:
> > 
>  +static unsigned long calc_memmap_size(unsigned long spanned_pages,
>  +  unsigned long present_pages)
>  +{
>  +unsigned long pages = spanned_pages;
>  +
>  +/*
>  + * Provide a more accurate estimation if there are big holes 
>  within
>  + * the zone and SPARSEMEM is in use.
>  + */
>  +if (spanned_pages > present_pages + (present_pages >> 4) &&
>  +IS_ENABLED(CONFIG_SPARSEMEM))
>  +pages = present_pages;
>  +
>  +return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
>  +}
> >>>
> >>> Please explain the ">> 4" heuristc more completely - preferably in both
> >>> the changelog and code comments.  Why can't we calculate this
> >>> requirement exactly?  That might require a second pass, but that's OK for
> >>> code like this?
> >> Hi Andrew,
> >>A normal x86 platform always have some holes within the DMA ZONE,
> >> so the ">> 4" heuristic is to avoid applying this adjustment to the DMA
> >> ZONE on x86 platforms. 
> >>Because the memmap_size is just an estimation, I feel it's OK to
> >> remove the ">> 4" heuristic, that shouldn't affect much.
> > 
> > Again: why can't we calculate this requirement exactly?  That might
> > require a second pass, but that's OK for code like this?
> 
> Hi Andrew,
>   If there are holes within a zone, it may cost us one or two extra pages
> for each populated region within the zone due to alignment because memmap for 
> each populated regions may not naturally aligned on page boundary.

Right.  So with an additional pass across the zone and a bit of
arithmetic, we can calculate the exact space requirement for memmap?
No need for kludgy heuristics?

> Originally the ">> 4" heuristic is to trade off these extra memmap pages,
> especially for small zones linke DMA zone.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/9] ocfs2: Use generic handlers of O_SYNC AIO DIO

2012-11-21 Thread Joel Becker

On Mon, Nov 19, 2012 at 11:51:14PM -0800, Darrick J. Wong wrote:
> Use generic handlers to queue fsync() when AIO DIO is completed for O_SYNC
> file.
> 
> From: Jan Kara 
> Signed-off-by: Jan Kara 
> Signed-off-by: Jeff Moyer 
Acked-by: Joel Becker 

> ---
>  fs/ocfs2/aops.c |6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 6577432..60457cc 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -593,9 +593,7 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
>   level = ocfs2_iocb_rw_locked_level(iocb);
>   ocfs2_rw_unlock(inode, level);
>  
> - if (is_async)
> - aio_complete(iocb, ret, 0);
> - inode_dio_done(inode);
> + generic_dio_end_io(iocb, offset, bytes, private, ret, is_async);
>  }
>  
>  /*
> @@ -642,7 +640,7 @@ static ssize_t ocfs2_direct_IO(int rw,
>   return __blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev,
>   iov, offset, nr_segs,
>   ocfs2_direct_IO_get_blocks,
> - ocfs2_dio_end_io, NULL, 0);
> + ocfs2_dio_end_io, NULL, DIO_SYNC_WRITES);
>  }
>  
>  static void ocfs2_figure_cluster_boundaries(struct ocfs2_super *osb,
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

"Hell is oneself, hell is alone, the other figures in it, merely projections."
- T. S. Eliot

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFT PATCH v1 1/5] mm: introduce new field "managed_pages" to struct zone

2012-11-21 Thread Andrew Morton

On Wed, 21 Nov 2012 22:36:56 +0800
Jiang Liu  wrote:

> > void mod_zone_managed_pages(struct zone *zone, signed long delta)
> > {
> > WARN_ON(system_state != SYSTEM_BOOTING &&
> > !is_locked_memory_hotplug());
> > zone->managed_pages += delta;
> > }
> This seems a little overhead because __free_pages_bootmem() is on the hot path
> and will be called many times at boot time.

Maybe, maybe not.  These things are measurable so let's not just guess.

But I'm not really recommending that we do this - there are all sorts
of things we *could* check and warn about, but we don't.  Potential
errors in this area don't seem terribly important.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PULL] Yama update (3.8)

2012-11-21 Thread Serge E. Hallyn

Quoting Kees Cook (keesc...@chromium.org):
> Hi James,
> 
> Please pull these Yama changes for 3.8. Thanks!
> 
> -Kees,
> 
> The following changes since commit b5666502700855a1eb1a15482005b22478b9460e:
> 
>   drivers/char/tpm: remove tasklet and cleanup (2012-11-01 15:23:14 -0500)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/yama-3.8
> 
> for you to fetch changes up to 235e752789eb65a81477bb82845323dfcbf93012:
> 
>   Yama: remove locking from delete path (2012-11-20 10:32:08 -0800)
> 

Sorry, I never saw the new version of the second patch in my email (likely
my bad, i'm quick to delete), but looking at gitweb it looks good, thanks.

Reviewed-by: Serge Hallyn 

thanks,
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, smpboot: allow manual hotplug of CPUs

2012-11-21 Thread H. Peter Anvin

On 11/21/2012 11:19 AM, Sasha Levin wrote:
>>
>> So, are there any mptables platforms which support hotplug?  If the
>> answer is "KVM" then the answer is that KVM needs to move to ACPI to get
>> the proper functionality; putting a hack in is really not okay.
> 
> There are no platforms which support actual hotplug, but you can still set
> existing processors as disabled in the table and without this patch there's
> no way enable them.
> 
> I'm not sure if it's a "hack" though - the presentation of hotpluggable cpus
> is the almost the same between mptable and acpi, and acpi provides a way to
> manually probe/release cpus as well. The only difference is that acpi also
> provides notifications about such events.
> 
> Actually, maybe acpi should start using probe/release as well... hmm...
> 

The bottom line is that I don't want the underlying implementation to
end up with a user-visible difference... therein lies madness and lots
of bugs.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [3.7-rc] fix incorrect NR_FREE_PAGES accounting (appears like memory leak)

2012-11-21 Thread Dave Hansen


This needs to make it in before 3.7 is released.

--

There have been some 3.7-rc reports of vm issues, including some
kswapd bugs and, more importantly, some memory "leaks":

http://www.spinics.net/lists/linux-mm/msg46187.html
https://bugzilla.kernel.org/show_bug.cgi?id=50181

The post-3.6 commit 1fb3f8ca took split_free_page() and reused
it for the compaction code.  It does something curious with
capture_free_page() (previously known as split_free_page()):

int capture_free_page(struct page *page, int alloc_order,
...
__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));

-   /* Split into individual pages */
-   set_page_refcounted(page);
-   split_page(page, order);
+   if (alloc_order != order)
+   expand(zone, page, alloc_order, order,
+   >free_area[order], migratetype);

Note that expand() puts the pages _back_ in the allocator, but it
does not bump NR_FREE_PAGES.  We "return" 'alloc_order' worth of
pages, but we accounted for removing 'order' in the
__mod_zone_page_state() call.  For the old split_page()-style use
(order==alloc_order) the bug will not trigger.  But, when called
from the compaction code where we occasionally get a larger page
out of the buddy allocator than we need, we will run in to this.

This patch simply changes the NR_FREE_PAGES manipulation to the
correct 'alloc_order' instead of 'order'.

I've been able to repeatedly trigger this in my testing
environment.  The amount "leaked" very closely tracks the
imbalance I see in buddy pages vs. NR_FREE_PAGES.  I have
confirmed that this patch fixes the imbalance

Signed-off-by: Dave Hansen 
Acked-by: Mel Gorman 
---

 linux-2.6.git-dave/mm/page_alloc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN mm/page_alloc.c~leak-fix-20121120-2 mm/page_alloc.c
--- linux-2.6.git/mm/page_alloc.c~leak-fix-20121120-2   2012-11-21 
14:14:52.053714749 -0500
+++ linux-2.6.git-dave/mm/page_alloc.c  2012-11-21 14:14:52.069714883 -0500
@@ -1405,7 +1405,7 @@ int capture_free_page(struct page *page,
 
mt = get_pageblock_migratetype(page);
if (unlikely(mt != MIGRATE_ISOLATE))
-   __mod_zone_freepage_state(zone, -(1UL << order), mt);
+   __mod_zone_freepage_state(zone, -(1UL << alloc_order), mt);
 
if (alloc_order != order)
expand(zone, page, alloc_order, order,
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] New Nokia RX-51 power supply battery driver

2012-11-21 Thread Tony Lindgren

* Anton Vorontsov  [121119 10:25]:
> On Mon, Nov 19, 2012 at 01:18:29PM +0100, Pali Rohár wrote:
> [...] 
> > Ok. Here is missing patch which register this driver in Nokia N900 board 
> > code. Without it driver is not loaded.
> 
> Cc'ing OMAP folks.

Looks OK to me queue with the other patches in the series:

Acked-by: Tony Lindgren 
 
 > From 0b60efd06a71668439bcb761c6572dd7df91dc17 Mon Sep 17 00:00:00 2001
> > From: =?UTF-8?q?Pali=20Roh=C3=A1r?= 
> > Date: Mon, 19 Nov 2012 09:05:24 +0100
> > Subject: [PATCH 1/3] ARM: OMAP: rx51: Register platform device for
> >  rx51_battery driver
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> > 
> > Signed-off-by: Pali Rohár 
> > ---
> >  arch/arm/mach-omap2/board-rx51-peripherals.c |6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/arm/mach-omap2/board-rx51-peripherals.c 
> > b/arch/arm/mach-omap2/board-rx51-peripherals.c
> > index 020e03c..fe1ac7e 100644
> > --- a/arch/arm/mach-omap2/board-rx51-peripherals.c
> > +++ b/arch/arm/mach-omap2/board-rx51-peripherals.c
> > @@ -271,11 +271,17 @@ static struct platform_device rx51_charger_device = {
> > },
> >  };
> >  
> > +static struct platform_device rx51_battery_device = {
> > +   .name   = "rx51-battery",
> > +   .id = -1,
> > +};
> > +
> >  static void __init rx51_charger_init(void)
> >  {
> > WARN_ON(gpio_request_one(RX51_USB_TRANSCEIVER_RST_GPIO,
> > GPIOF_OUT_INIT_HIGH, "isp1704_reset"));
> >  
> > +   platform_device_register(_battery_device);
> > platform_device_register(_charger_device);
> >  }
> >  
> > -- 
> > 1.7.10.4
> > 
> > -- 
> > Pali Rohár
> > pali.ro...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, smpboot: allow manual hotplug of CPUs

2012-11-21 Thread Sasha Levin

On 11/21/2012 01:38 PM, H. Peter Anvin wrote:
> On 11/21/2012 10:35 AM, Sasha Levin wrote:
>>>
>>> Reading between the lines, this sounds like would cause a user-visible
>>> difference between mptable platforms and ACPI platforms?  If so, that is
>>> totally unacceptable.  If not, the description is confusing.
>>
>> With ACPI platforms you don't need probe/release because the hardware 
>> notifies
>> on CPU insert/eject - this doesn't exist on mptable which is why you have to
>> do it manually with probe/release.
>>
>> The difference is already user visible: you can hotplug on ACPI, but can't on
>> mptables.
>>
>> Yes, reading back the subject does sound confusing - a better one would 
>> probably
>> be "provide interface for CPU hotplug on mptable platforms" or something 
>> similar.
>>
> 
> So, are there any mptables platforms which support hotplug?  If the
> answer is "KVM" then the answer is that KVM needs to move to ACPI to get
> the proper functionality; putting a hack in is really not okay.

There are no platforms which support actual hotplug, but you can still set
existing processors as disabled in the table and without this patch there's
no way enable them.

I'm not sure if it's a "hack" though - the presentation of hotpluggable cpus
is the almost the same between mptable and acpi, and acpi provides a way to
manually probe/release cpus as well. The only difference is that acpi also
provides notifications about such events.

Actually, maybe acpi should start using probe/release as well... hmm...


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/3] input: Cypress PS/2 Trackpad psmouse driver

2012-11-21 Thread Henrik Rydberg

Hi Kamal,

> From: Cypress Semiconductor Corporation 
> 
> Input/mouse driver for Cypress PS/2 Trackpad.
> 
> Original code contributed by Cypress Semiconductor Corporation,
> modified by Kamal Mostafa and Kyle Fazzari.
> 
> BugLink: http://launchpad.net/bugs/978807
> 
> Signed-off-by: Kamal Mostafa 
> Signed-off-by: Kyle Fazzari 
> Signed-off-by: Mario Limonciello 
> Signed-off-by: Tim Gardner 
> Acked-by: Herton Krzesinski 
> ---
>  drivers/input/mouse/cypress_ps2.c |  956 
> +
>  drivers/input/mouse/cypress_ps2.h |  220 +
>  2 files changed, 1176 insertions(+)
>  create mode 100644 drivers/input/mouse/cypress_ps2.c
>  create mode 100644 drivers/input/mouse/cypress_ps2.h

Reading the patch, it looks very much like a typical semi-mt device to
me. Any good reason not to handle it that way?

> diff --git a/drivers/input/mouse/cypress_ps2.c 
> b/drivers/input/mouse/cypress_ps2.c
> new file mode 100644
> index 000..5762be6
> --- /dev/null
> +++ b/drivers/input/mouse/cypress_ps2.c
> @@ -0,0 +1,956 @@
> +/*
> + * Cypress Trackpad PS/2 mouse driver
> + *
> + * Copyright (c) 2012 Cypress Semiconductor Corporation.
> + *
> + * Additional contributors include:
> + *   Kamal Mostafa 
> + *   Kyle Fazzari 
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published 
> by
> + * the Free Software Foundation.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "cypress_ps2.h"
> +
> +#define CYTP_DBG_DUMP 0  /* set to 1 for more verbose debug dump 
> */
> +
> +#define cytp_dbg(fmt, ...)  \
> + do {  \
> + if (cytp)  \
> + psmouse_dbg(psmouse, pr_fmt(fmt), ##__VA_ARGS__);  \
> + } while (0)
> +
> +#if CYTP_DBG_DUMP
> +# define cytp_dbg_dump(fmt, ...)  \
> + do {  \
> + if (cytp)  \
> + psmouse_dbg(psmouse, pr_fmt(fmt), ##__VA_ARGS__);  \
> + } while (0)
> +#else
> +# define cytp_dbg_dump(fmt, ...)
> +#endif

Thse two look identical, why not define one in terms of the other?

> +static int read_timeout = 200;
> +module_param_named(cy_read_timeout, read_timeout, int, 0644);
> +MODULE_PARM_DESC(cy_read_timeout, "Set CyPS/2 cmd read timeout (default 200 
> msec)");

Why is this configurable?

> +/* p is a pointer points to the buffer containing Cypress Keys. */
> +#define IS_CYPRESS_KEY(p) ((p[0] == CYPRESS_KEY_1) && (p[1] == 
> CYPRESS_KEY_2))
> +#define CYTP_SET_PACKET_SIZE(n) { psmouse->pktsize = cytp->pkt_size = (n); }
> +#define CYTP_SET_MODE_BIT(x)  \
> + do {  \
> + if ((x) & CYTP_BIT_ABS_REL_MASK)  \
> + cytp->mode = (cytp->mode & ~CYTP_BIT_ABS_REL_MASK) | 
> (x);  \
> + else  \
> + cytp->mode |= (x);  \
> + } while (0)

Clearly this can be simplified.

> +#define CYTP_CLEAR_MODE_BIT(x)   { cytp->mode &= ~(x); }
> +
> +#define CYTP_SUPPORT_ABS
> +
> +static unsigned char cytp_rate[] = {10, 20, 40, 60, 100, 200};
> +static unsigned char cytp_resolution[] = {0x00, 0x01, 0x02, 0x03};
> +
> +static int cypress_ps2_sendbyte(struct psmouse *psmouse, int value)
> +{
> + struct cytp_data *cytp = psmouse->private;
> + struct ps2dev *ps2dev = >ps2dev;
> +
> + if (ps2_sendbyte(ps2dev, value & 0xff, CYTP_CMD_TIMEOUT) < 0) {
> + cytp_dbg("send command 0x%02x failed, resp 0x%02x\n",
> +  value & 0xff, ps2dev->nak);
> + if (ps2dev->nak == CYTP_PS2_RETRY)
> + return CYTP_PS2_RETRY;
> + else
> + return CYTP_PS2_ERROR;
> + }
> +
> + cytp_dbg("send command 0x%02x success, resp 0xfa\n", value & 0xff);
> +
> + return 0;
> +}
> +
> +static int cypress_ps2_ext_cmd(struct psmouse *psmouse, unsigned short cmd,
> +unsigned char data)
> +{
> + struct ps2dev *ps2dev = >ps2dev;
> + int tries = CYTP_PS2_CMD_TRIES;
> + int rc;
> +
> + ps2_begin_command(ps2dev);
> +
> + do {
> + /*
> +  * send extension command 0xE8 or 0xF3,
> +  * if send extension command failed,
> +  * try to send recovery command to make
> +  * trackpad device return to ready wait command state.
> +  * It alwasy success based on this recovery commands.
> +  */
> + rc = cypress_ps2_sendbyte(psmouse, cmd & 0xff);
> + if (rc == CYTP_PS2_RETRY) {
> + rc = cypress_ps2_sendbyte(psmouse, 0x00);
> + if (rc == CYTP_PS2_RETRY)
> + rc = cypress_ps2_sendbyte(psmouse, 0x0a);
> + }
> + if (rc == CYTP_PS2_ERROR)
> + continue;
> +
> + rc = cypress_ps2_sendbyte(psmouse, data);
> +

Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-21 Thread H. Peter Anvin

On 11/21/2012 10:59 AM, Yinghai Lu wrote:
> 
> in boot_param:
> 
> struct setup_header hdr;/* setup header */  /* 0x1f1 */
> __u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
> __u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];  /* 0x290 */
> struct e820entry e820_map[E820MAX]; /* 0x2d0 */
> __u8  _pad8[48];/* 0xcd0 */
> struct edd_info eddbuf[EDDMAXNR];   /* 0xd00 */
> __u8  _pad9[276];   /* 0xeec */
> 
> so we can use till 0x290.
> 
> and after those three dword, will still have 7 left.
> 

Not quite... the length of the initialized header is given by the byte
at 0x201, which can be at most 0x7f unfortunately.  This means 0x280 is
the endpoint, not 0x290.  Some bootloaders rely on this.

However, from the point of view of the 32- and 64-bit entry points, this
is effectively a .data segment, but these can go into the corresponding
.bss segment, which is the rest of struct boot_params.

>>
>>> diff --git a/arch/x86/boot/compressed/cmdline.c
>>> b/arch/x86/boot/compressed/cmdline.c
>>> index b4c913c..00678d3 100644
>>> --- a/arch/x86/boot/compressed/cmdline.c
>>> +++ b/arch/x86/boot/compressed/cmdline.c
>>> @@ -17,6 +17,9 @@ static unsigned long get_cmd_line_ptr(void)
>>>   {
>>> unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
>>>
>>> +   if (real_mode->hdr.version >= 0x020c)
>>> +   cmd_line_ptr |= (u64)real_mode->hdr.ext_cmd_line_ptr <<
>>> 32;
>>> +
>>> return cmd_line_ptr;
>>>   }
>>
>>
>> No.  hdr.version is information from the kernel to the bootloader; it is
>> meaningless to look at it inside the kernel.
>>
> could remove them, but how about vmlinux elf.
> 
> when kexec vmlinux elf, it will fake one hdr, and fill version there.
> 
>> Same in a bunch of other places.

Then whatever loads vmlinux.elf is responsible for initializing those
fields to zero anyway.  It is still an atrocious abuse.  What we
probably need to do is to include the initialized header in a section in
vmlinux.elf containing the default struct boot_params.  This is the kind
of things that happen when people do things without thinking through all
the consequences.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 08/12] x86, boot: Don't check if cmd_line_ptr is accessible in misc/decompressor()

2012-11-21 Thread Yinghai Lu

On Wed, Nov 21, 2012 at 9:21 AM, H. Peter Anvin  wrote:
> On 11/20/2012 11:16 PM, Yinghai Lu wrote:
>>
>> At that stage, it is already in 32bit protected mode or 64bit mode.
>> so we do not need to check if ptr less 1M.
>>
>> When go from other boot loader (kexec) instead of boot/ code path.
>>
>> Move out accessible checking out __cmdline_find_option
>>
>> So misc.c will parse cmdline and have debug print out.
>
>
> Your description doesn't seem to match the code, and is incredibly confusing
> to the reader.
>
> The reason why is because you leave out an essential piece of information:
> cmdline.c is included both in 16-bit code and in the decompressor (32/64-bit
> code), so you want to move the test out of the shared code.

updated change log to:

Subject: [PATCH] x86, boot: move checking of cmd_line_ptr out of common path

cmdline.c::__cmdline_find_option... are shared between
16-bit setup code and 32/64 bit decompressor code.

for 32/64 only path via kexec, we should not check if ptr less 1M.
as those cmdline could be put above 1M even 4G.

Move out accessible checking out of __cmdline_find_option
So decompressor in misc.c can parse cmdline correctly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls

2012-11-21 Thread Andrew Morton

On Wed, 21 Nov 2012 10:20:07 +0100
Marek Szyprowski  wrote:

> Hello,
> 
> On 11/21/2012 9:36 AM, Andrew Morton wrote:
> > On Wed, 21 Nov 2012 09:08:52 +0100 Marek Szyprowski 
> >  wrote:
> >
> > > Hello,
> > >
> > > On 11/20/2012 8:33 PM, Andrew Morton wrote:
> > > > On Tue, 20 Nov 2012 15:31:45 +0100
> > > > Marek Szyprowski  wrote:
> > > >
> > > > > dmapool always calls dma_alloc_coherent() with GFP_ATOMIC flag,
> > > > > regardless the flags provided by the caller. This causes excessive
> > > > > pruning of emergency memory pools without any good reason. 
> > > > > Additionaly,
> > > > > on ARM architecture any driver which is using dmapools will sooner or
> > > > > later  trigger the following error:
> > > > > "ERROR: 256 KiB atomic DMA coherent pool is too small!
> > > > > Please increase it with coherent_pool= kernel parameter!".
> > > > > Increasing the coherent pool size usually doesn't help much and only
> > > > > delays such error, because all GFP_ATOMIC DMA allocations are always
> > > > > served from the special, very limited memory pool.
> > > > >
> > > >
> > > > Is this problem serious enough to justify merging the patch into 3.7?
> > > > And into -stable kernels?
> > >
> > > I wonder if it is a good idea to merge such change at the end of current
> > > -rc period.
> >
> > I'm not sure what you mean by this.
> >
> > But what we do sometimes if we think a patch needs a bit more
> > real-world testing before backporting is to merge it into -rc1 in the
> > normal merge window, and tag it for -stable backporting.  That way it
> > gets a few weeks(?) testing in mainline before getting backported.
> 
> I just wondered that if it gets merged to v3.7-rc7 there won't be much time
> for real-world testing before final v3.7 release. This patch is in
> linux-next for over a week and I'm not aware of any issues, but -rc releases
> gets much more attention and testing than linux-next tree.
> 
> If You think it's fine to put such change to v3.7-rc7 I will send a pull
> request and tag it for stable asap.
> 

What I'm suggesting is that it be merged for 3.8-rc1 with a -stable
tag, then it will be backported into 3.7.x later on.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 36/46] mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships

2012-11-21 Thread Mel Gorman

On Wed, Nov 21, 2012 at 07:25:37PM +0100, Ingo Molnar wrote:
> 
> * Mel Gorman  wrote:
> 
> > While it is desirable that all threads in a process run on its home
> > node, this is not always possible or necessary. There may be more
> > threads than exist within the node or the node might over-subscribed
> > with unrelated processes.
> > 
> > This can cause a situation whereby a page gets migrated off its home
> > node because the threads clearing pte_numa were running off-node. This
> > patch uses page->last_nid to build a two-stage filter before pages get
> > migrated to avoid problems with short or unlikely task<->node
> > relationships.
> > 
> > Signed-off-by: Mel Gorman 
> > ---
> >  mm/mempolicy.c |   30 +-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 4c1c8d8..fd20e28 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2317,9 +2317,37 @@ int mpol_misplaced(struct page *page, struct 
> > vm_area_struct *vma, unsigned long
> > }
> >  
> > /* Migrate the page towards the node whose CPU is referencing it */
> > -   if (pol->flags & MPOL_F_MORON)
> > +   if (pol->flags & MPOL_F_MORON) {
> > +   int last_nid;
> > +
> > polnid = numa_node_id();
> >  
> > +   /*
> > +* Multi-stage node selection is used in conjunction
> > +* with a periodic migration fault to build a temporal
> > +* task<->page relation. By using a two-stage filter we
> > +* remove short/unlikely relations.
> > +*
> > +* Using P(p) ~ n_p / n_t as per frequentist
> > +* probability, we can equate a task's usage of a
> > +* particular page (n_p) per total usage of this
> > +* page (n_t) (in a given time-span) to a probability.
> > +*
> > +* Our periodic faults will sample this probability and
> > +* getting the same result twice in a row, given these
> > +* samples are fully independent, is then given by
> > +* P(n)^2, provided our sample period is sufficiently
> > +* short compared to the usage pattern.
> > +*
> > +* This quadric squishes small probabilities, making
> > +* it less likely we act on an unlikely task<->page
> > +* relation.
> > +*/
> > +   last_nid = page_xchg_last_nid(page, polnid);
> > +   if (last_nid != polnid)
> > +   goto out;
> > +   }
> > +
> > if (curnid != polnid)
> > ret = polnid;
> >  out:
> 
> As mentioned in my other mail, this patch of yours looks very 
> similar to the numa/core commit attached below, mostly written 
> by Peter:
> 
>   30f93abc6cb3 sched, numa, mm: Add the scanning page fault machinery
> 

My patch is directly based on that particular patch and is a partial
extraction. I could not directly pull which is why the From is missing. I
think you'll also find that it's very similar to a partial extraction
from "autonuma: memory follows CPU algorithm and task/mm_autonuma stats
collection". The primary differences are exactly how the logic is applied
and when it happens.

I've added a note now to that effect now. For all the patches with notes
or any other ones, I'll be very happy to add the Signed-offs back on if
the original authors acknowledge they are ok with the end result. If you
recall, in the original V1 of this series I said;

This series steals very heavily from both autonuma and schednuma
with very little original code. In some cases I removed the
signed-off-bys because the result was too different. I have noted
in the changelog where this happened but the signed-offs can be
restored if the original authors agree.

Just to compare, this is the wording in "autonuma: memory follows CPU
algorithm and task/mm_autonuma stats collection"

+/*
+ * In this function we build a temporal CPU_node<->page relation by
+ * using a two-stage autonuma_last_nid filter to remove short/unlikely
+ * relations.
+ *
+ * Using P(p) ~ n_p / n_t as per frequentest probability, we can
+ * equate a node's CPU usage of a particular page (n_p) per total
+ * usage of this page (n_t) (in a given time-span) to a probability.
+ *
+ * Our periodic faults will then sample this probability and getting
+ * the same result twice in a row, given these samples are fully
+ * independent, is then given by P(n)^2, provided our sample period
+ * is sufficiently short compared to the usage pattern.
+ *
+ * This quadric squishes small probabilities, making it less likely
+ * we act on an unlikely CPU_node<->page relation.
+ */

If this was the basis for the sched/numa patch then I'd point out that
I'm not the only person that failed to preserve history perfectly.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the

Re: [PATCH 000/493] remove CONFIG_HOTPLUG as an option

2012-11-21 Thread Greg KH

On Wed, Nov 21, 2012 at 01:41:46PM -0500, Bill Pemberton wrote:
> Andrew Morton writes:
> > 
> > On Tue, 20 Nov 2012 10:46:11 + Grant Likely  
> > wrote:
> > 
> > > On Sat, Nov 17, 2012 at 12:19 AM, Bill Pemberton  
> > > wrote:
> > > > CONFIG_HOTPLUG is no longer an optional setting.  In order to remove
> > > > it as on option code paths that check CONFIG_HOTPLUG will removed
> > > > along with the attributes __devexit_p, __devexit, __devinitconst, and
> > > > __devinitdata.
> > > >
> > > > I'll save the list from the mailbomb of this huge patchset.  The
> > > > patches themselves are going to Greg KH for the driver core tree.
> > > >
> > > >
> > > > Bill Pemberton (493):
> > > [...]
> > > >  2942 files changed, 11645 insertions(+), 12116 deletions(-)
> > > 
> > > So, I've got no problem with the reason for the change and I don't
> > > even think you need my ack for the bits that I maintain (though you
> > > have it if you want it). However, this looks like it is going to be
> > > /painful/. First of all it will touch a huge number of files in the
> > > tree. Yes the change is trivial, but it will require manual fixups on
> > > a lot of patches.
> > 
> > Yeah, this is dopey.  Send the script to Linus and ask him to run it
> > seven seconds before he releases -rc1, when everyone's trees are
> > empty(ish).  Or send him a single megapatch at that time.
> > 
> 
> I like the script idea for removing all the __dev markings.  Creating
> the patches in the first place was a game of whack-a-mole as various
> trees changed.

Linus doesn't like to take scripts, I had planned on queueing all of
these up that different subsystems maintainers didn't take, and pushing
the ones that did merge cleanly into -rc1.  Then, right after -rc1 is
out, go through the tree once more to get the stragglers.

Sound reasonable?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/46] Automatic NUMA Balancing V4

2012-11-21 Thread Mel Gorman

On Wed, Nov 21, 2012 at 07:21:58PM +0100, Ingo Molnar wrote:
> 
> * Mel Gorman  wrote:
> 
> > On Wed, Nov 21, 2012 at 06:33:16PM +0100, Ingo Molnar wrote:
> > > 
> > > * Mel Gorman  wrote:
> > > 
> > > > On Wed, Nov 21, 2012 at 06:03:06PM +0100, Ingo Molnar wrote:
> > > > > 
> > > > > * Mel Gorman  wrote:
> > > > > 
> > > > > > On Wed, Nov 21, 2012 at 10:21:06AM +, Mel Gorman wrote:
> > > > > > > 
> > > > > > > I am not including a benchmark report in this but will be posting 
> > > > > > > one
> > > > > > > shortly in the "Latest numa/core release, v16" thread along with 
> > > > > > > the latest
> > > > > > > schednuma figures I have available.
> > > > > > > 
> > > > > > 
> > > > > > Report is linked here https://lkml.org/lkml/2012/11/21/202
> > > > > > 
> > > > > > I ended up cancelling the remaining tests and restarted with
> > > > > > 
> > > > > > 1. schednuma + patches posted since so that works out as
> > > > > 
> > > > > Mel, I'd like to ask you to refer to our tree as numa/core or 
> > > > > 'numacore' in the future. Would such a courtesy to use the 
> > > > > current name of our tree be possible?
> > > > > 
> > > > 
> > > > Sure, no problem.
> > > 
> > > Thanks!
> > > 
> > > I ran a quick test with your 'balancenuma v4' tree and while 
> > > numa02 and numa01-THREAD-ALLOC performance is looking good, 
> > > numa01 performance does not look very good:
> > > 
> > > mainlinenuma/core  balancenuma-v4
> > >  numa01:   340.3   139.4  276 secs
> > > 
> > > 97% slower than numa/core.
> > > 
> > 
> > It would be. numa01 is an adverse workload where all threads 
> > are hammering the same memory.  The two-stage filter in 
> > balancenuma restricts the amount of migration it does so it 
> > ends up in a situation where it cannot balance properly. [...]
> 
> Do you mean this "balancenuma v4" patch attributed to you:
> 
>  Subject: mm: Numa: Use a two-stage filter to restrict pages being migrated 
> for unlikely task<->node relationships
>  From: Mel Gorman 
>  Date: Wed, 21 Nov 2012 10:21:42 +
> 

Yes.

>  ...
> 
>  Signed-off-by: Mel Gorman 
> 
> which has:
> 
> /*
>  * Multi-stage node selection is used in conjunction
>  * with a periodic migration fault to build a temporal
>  * task<->page relation. By using a two-stage filter we
>  * remove short/unlikely relations.
>  *
>  * Using P(p) ~ n_p / n_t as per frequentist
>  * probability, we can equate a task's usage of a
>  * particular page (n_p) per total usage of this
>  * page (n_t) (in a given time-span) to a probability.
>  *
>  * Our periodic faults will sample this probability and
>  * getting the same result twice in a row, given these
>  * samples are fully independent, is then given by
>  * P(n)^2, provided our sample period is sufficiently
>  * short compared to the usage pattern.
>  *
>  * This quadric squishes small probabilities, making
>  * it less likely we act on an unlikely task<->page
>  * relation.
> 
> This looks very similar to the code and text that Peter wrote 
> for numa/core:
> 
> /*
>  * Multi-stage node selection is used in conjunction with a periodic
>  * migration fault to build a temporal task<->page relation. By
>  * using a two-stage filter we remove short/unlikely relations.
>  *
>  * Using P(p) ~ n_p / n_t as per frequentist probability, we can
>  * equate a task's usage of a particular page (n_p) per total usage
>  * of this page (n_t) (in a given time-span) to a probability.
>  *
>  * Our periodic faults will then sample this probability and getting
>  * the same result twice in a row, given these samples are fully
>  * independent, is then given by P(n)^2, provided our sample period
>  * is sufficiently short compared to the usage pattern.
>  *
>  * This quadric squishes small probabilities, making it less likely
>  * we act on an unlikely task<->page relation.
>  *
>  * Return the best node ID this page should be on, or -1 if it should
>  * stay where it is.
>  */
> 
> see commit:
> 
>   30f93abc6cb3 sched, numa, mm: Add the scanning page fault machinery
> 
> ?
> 
> I think it's the very same concept - yours is taken from an 
> older sched/numa commit and attributed to yourself? [If so then 
> please fix the attribution.]

Yes, it's completely based on earlier sched/numa patches. In many of the
patches you'll see notes where I documented what patches I originally
based on -- be it from sched/numa, autonuma or some combination of both.
In many cases I could not keep the signed-off-by because the end result
was simply too different to claim that the author was happy with it. I was
hoping that these notes would convert to signed-offs-by after

Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-21 Thread Yinghai Lu

On Wed, Nov 21, 2012 at 9:17 AM, H. Peter Anvin  wrote:
> On 11/20/2012 11:16 PM, Yinghai Lu wrote:
>>
>>
>> diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
>> index 9efceff..a8263f7 100644
>> --- a/Documentation/x86/boot.txt
>> +++ b/Documentation/x86/boot.txt
>> @@ -57,6 +57,9 @@ Protocol 2.10:(Kernel 2.6.31) Added a protocol
>> for relaxed alignment
>>   Protocol 2.11:(Kernel 3.6) Added a field for offset of EFI
>> handover
>> protocol entry point.
>>
>> +Protocol 2.12: (Kernel 3.9) Added three fields for loading bzImage and
>> +ramdisk above 4G with 64bit.
>> +
>>    MEMORY LAYOUT
>>
>>   The traditional memory map for the kernel loader, used for Image or
>> @@ -182,7 +185,7 @@ Offset  Proto   NameMeaning
>>   0230/42.05+   kernel_alignment Physical addr alignment required
>> for kernel
>>   0234/12.05+   relocatable_kernel Whether kernel is relocatable
>> or not
>>   0235/12.10+   min_alignment   Minimum alignment, as a power of
>> two
>> -0236/2 N/A pad3Unused
>> +0236/2 2.12+   xloadflags  Boot protocal option flags
>
>  
sorry.
>
>>   0238/42.06+   cmdline_sizeMaximum size of the kernel command
>> line
>>   023C/42.07+   hardware_subarch Hardware subarchitecture
>>   0240/82.07+   hardware_subarch_data Subarchitecture-specific
>> data
>> @@ -193,6 +196,9 @@ Offset  Proto   NameMeaning
>>   0258/82.10+   pref_addressPreferred loading address
>>   0260/42.10+   init_size   Linear memory required during
>> initialization
>>   0264/42.11+   handover_offset Offset of handover entry point
>> +0268/4 2.12+   ext_ramdisk_image ramdisk_image 32 bits
>
>
> "high 32 bits" presumably...

ok

>
>
>> +026C/4 2.12+   ext_ramdisk_size ramdisk_size high 32 bits
>> +0270/4 2.12+   ext_cmd_line_ptr cmd_line_ptr high 32 bits
>
>
> I'm looking at these three fields and I'm getting worried about space --
> there are only two more word-sized fields possible in this structure. Since
> these fields are not initialized (default to zero) and almost certainly
> aren't useful for people entering via the 16-bit entry point I think we
> should move them out of struct setup_header and into the remainder of struct
> boot_param.

in boot_param:

struct setup_header hdr;/* setup header */  /* 0x1f1 */
__u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
__u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];  /* 0x290 */
struct e820entry e820_map[E820MAX]; /* 0x2d0 */
__u8  _pad8[48];/* 0xcd0 */
struct edd_info eddbuf[EDDMAXNR];   /* 0xd00 */
__u8  _pad9[276];   /* 0xeec */

so we can use till 0x290.

and after those three dword, will still have 7 left.

>
>> diff --git a/arch/x86/boot/compressed/cmdline.c
>> b/arch/x86/boot/compressed/cmdline.c
>> index b4c913c..00678d3 100644
>> --- a/arch/x86/boot/compressed/cmdline.c
>> +++ b/arch/x86/boot/compressed/cmdline.c
>> @@ -17,6 +17,9 @@ static unsigned long get_cmd_line_ptr(void)
>>   {
>> unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
>>
>> +   if (real_mode->hdr.version >= 0x020c)
>> +   cmd_line_ptr |= (u64)real_mode->hdr.ext_cmd_line_ptr <<
>> 32;
>> +
>> return cmd_line_ptr;
>>   }
>
>
> No.  hdr.version is information from the kernel to the bootloader; it is
> meaningless to look at it inside the kernel.
>
could remove them, but how about vmlinux elf.

when kexec vmlinux elf, it will fake one hdr, and fill version there.

> Same in a bunch of other places.
>

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 5/5] ARM: OMAP4: hwmod data: ipu and dsp to use parent clocks instead of leaf clocks

2012-11-21 Thread Tony Lindgren

* Omar Ramirez Luna  [121119 17:08]:
> This prevents hwmod _enable_clocks...omap2_dflt_clk_enable path
> from enabling modulemode inside CLKCTRL using its clk->enable_reg
> field. Instead is left to _omap4_enable_module though soc_ops, as
> the one in charge of this setting.
> 
> According to comments received[1] for related patches the idea is
> to get rid of leaf clocks in future. So remove these two while at it.
> 
> [1] http://lkml.org/lkml/2012/8/20/226

This one should be queued by Paul, or at least acked by him.

Regards,

Tony

 
> Signed-off-by: Omar Ramirez Luna 
> ---
>  arch/arm/mach-omap2/clock44xx_data.c   |   22 --
>  arch/arm/mach-omap2/omap_hwmod_44xx_data.c |4 ++--
>  2 files changed, 2 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/clock44xx_data.c 
> b/arch/arm/mach-omap2/clock44xx_data.c
> index 6efc30c..067c486 100644
> --- a/arch/arm/mach-omap2/clock44xx_data.c
> +++ b/arch/arm/mach-omap2/clock44xx_data.c
> @@ -1316,16 +1316,6 @@ static struct clk dmic_fck = {
>   .clkdm_name = "abe_clkdm",
>  };
>  
> -static struct clk dsp_fck = {
> - .name   = "dsp_fck",
> - .ops= _omap2_dflt,
> - .enable_reg = OMAP4430_CM_TESLA_TESLA_CLKCTRL,
> - .enable_bit = OMAP4430_MODULEMODE_HWCTRL,
> - .clkdm_name = "tesla_clkdm",
> - .parent = _iva_m4x2_ck,
> - .recalc = _recalc,
> -};
> -
>  static struct clk dss_sys_clk = {
>   .name   = "dss_sys_clk",
>   .ops= _omap2_dflt,
> @@ -1696,16 +1686,6 @@ static struct clk i2c4_fck = {
>   .recalc = _recalc,
>  };
>  
> -static struct clk ipu_fck = {
> - .name   = "ipu_fck",
> - .ops= _omap2_dflt,
> - .enable_reg = OMAP4430_CM_DUCATI_DUCATI_CLKCTRL,
> - .enable_bit = OMAP4430_MODULEMODE_HWCTRL,
> - .clkdm_name = "ducati_clkdm",
> - .parent = _clk_mux_ck,
> - .recalc = _recalc,
> -};
> -
>  static struct clk iss_ctrlclk = {
>   .name   = "iss_ctrlclk",
>   .ops= _omap2_dflt,
> @@ -3151,7 +3131,6 @@ static struct omap_clk omap44xx_clks[] = {
>   CLK(NULL,   "div_ts_ck",_ts_ck, 
> CK_446X),
>   CLK(NULL,   "dmic_sync_mux_ck", _sync_mux_ck,  
> CK_443X),
>   CLK(NULL,   "dmic_fck", _fck,  
> CK_443X),
> - CLK(NULL,   "dsp_fck",  _fck,   
> CK_443X),
>   CLK(NULL,   "dss_sys_clk",  _sys_clk,   
> CK_443X),
>   CLK(NULL,   "dss_tv_clk",   _tv_clk,
> CK_443X),
>   CLK(NULL,   "dss_48mhz_clk",_48mhz_clk, 
> CK_443X),
> @@ -3183,7 +3162,6 @@ static struct omap_clk omap44xx_clks[] = {
>   CLK(NULL,   "i2c2_fck", _fck,  
> CK_443X),
>   CLK(NULL,   "i2c3_fck", _fck,  
> CK_443X),
>   CLK(NULL,   "i2c4_fck", _fck,  
> CK_443X),
> - CLK(NULL,   "ipu_fck",  _fck,   
> CK_443X),
>   CLK(NULL,   "iss_ctrlclk",  _ctrlclk,   
> CK_443X),
>   CLK(NULL,   "iss_fck",  _fck,   
> CK_443X),
>   CLK(NULL,   "iva_fck",  _fck,   
> CK_443X),
> diff --git a/arch/arm/mach-omap2/omap_hwmod_44xx_data.c 
> b/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
> index aab5c12..1f61093 100644
> --- a/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
> +++ b/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
> @@ -650,7 +650,7 @@ static struct omap_hwmod omap44xx_dsp_hwmod = {
>   .mpu_irqs   = omap44xx_dsp_irqs,
>   .rst_lines  = omap44xx_dsp_resets,
>   .rst_lines_cnt  = ARRAY_SIZE(omap44xx_dsp_resets),
> - .main_clk   = "dsp_fck",
> + .main_clk   = "dpll_iva_m4x2_ck",
>   .prcm = {
>   .omap4 = {
>   .clkctrl_offs = OMAP4_CM_TESLA_TESLA_CLKCTRL_OFFSET,
> @@ -1677,7 +1677,7 @@ static struct omap_hwmod omap44xx_ipu_hwmod = {
>   .mpu_irqs   = omap44xx_ipu_irqs,
>   .rst_lines  = omap44xx_ipu_resets,
>   .rst_lines_cnt  = ARRAY_SIZE(omap44xx_ipu_resets),
> - .main_clk   = "ipu_fck",
> + .main_clk   = "ducati_clk_mux_ck",
>   .prcm = {
>   .omap4 = {
>   .clkctrl_offs = OMAP4_CM_DUCATI_DUCATI_CLKCTRL_OFFSET,
> -- 
> 1.7.4.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 4/5] iommu/omap: adapt to runtime pm

2012-11-21 Thread Tony Lindgren

* Omar Ramirez Luna  [121119 17:08]:
> Use runtime PM functionality interfaced with hwmod enable/idle
> functions, to replace direct clock operations and sysconfig
> handling.
> 
> Due to reset sequence, pm_runtime_[get|put]_sync must be used, to
> avoid possible operations with the module under reset. Because of
> this and given that the driver uses spin_locks to protect their
> critical sections, we must use pm_runtime_irq_safe in order for the
> runtime ops to be happy, otherwise might_sleep_if checks in runtime
> framework will complain.
> 
> The remaining pm_runtime out of iommu_enable and iommu_disable
> corresponds to paths that can be accessed through debugfs, some of
> them doesn't work if the module is not enabled first, but in future
> if the mmu is idled withouth freeing, these are needed to debug.
> 
> Signed-off-by: Omar Ramirez Luna 
> ---
>  arch/arm/mach-omap2/omap-iommu.c |1 -
>  drivers/iommu/omap-iommu.c   |   40 ++---
>  drivers/iommu/omap-iommu.h   |3 --
>  drivers/iommu/omap-iommu2.c  |   17 
>  include/linux/platform_data/iommu-omap.h |1 -
>  5 files changed, 19 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/omap-iommu.c 
> b/arch/arm/mach-omap2/omap-iommu.c
> index 02726a6..7642fc4 100644
> --- a/arch/arm/mach-omap2/omap-iommu.c
> +++ b/arch/arm/mach-omap2/omap-iommu.c
> @@ -31,7 +31,6 @@ static int __init omap_iommu_dev_init(struct omap_hwmod 
> *oh, void *unused)
>   return -ENOMEM;
>  
>   pdata->name = oh->name;
> - pdata->clk_name = oh->main_clk;
>   pdata->nr_tlb_entries = a->nr_tlb_entries;
>   pdata->da_start = a->da_start;
>   pdata->da_end = a->da_end;

The runtime PM related changes would be good to be checked
by Kevin, added him to cc. For the arch/arm/mach-omap2/ change above:

Acked-by: Tony Lindgren 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/5] Input: bu21013_ts - Move GPIO init and exit functions into the driver

2012-11-21 Thread Dmitry Torokhov

Hi Lee,

On Wed, Nov 14, 2012 at 01:47:14PM +, Lee Jones wrote:
> @@ -272,6 +276,60 @@ static irqreturn_t bu21013_gpio_irq(int irq, void 
> *device_data)
>  }
>  
>  /**
> + * bu21013_gpio_board_init() - configures the touch panel
> + * @reset_pin: reset pin number
> + *
> + * This function is used to configure the voltage and
> + * reset the touch panel controller.
> + */
> +static int bu21013_gpio_board_init(int reset_pin)
> +{
> + int retval = 0;
> +
> + bu21013_devices++;
> + if (bu21013_devices == 1) {

This does not make sense. If gpio is per-device now then we should
simply set it up and not count devices.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 000/493] remove CONFIG_HOTPLUG as an option

2012-11-21 Thread Bill Pemberton

Andrew Morton writes:
> 
> On Tue, 20 Nov 2012 10:46:11 + Grant Likely  
> wrote:
> 
> > On Sat, Nov 17, 2012 at 12:19 AM, Bill Pemberton  wrote:
> > > CONFIG_HOTPLUG is no longer an optional setting.  In order to remove
> > > it as on option code paths that check CONFIG_HOTPLUG will removed
> > > along with the attributes __devexit_p, __devexit, __devinitconst, and
> > > __devinitdata.
> > >
> > > I'll save the list from the mailbomb of this huge patchset.  The
> > > patches themselves are going to Greg KH for the driver core tree.
> > >
> > >
> > > Bill Pemberton (493):
> > [...]
> > >  2942 files changed, 11645 insertions(+), 12116 deletions(-)
> > 
> > So, I've got no problem with the reason for the change and I don't
> > even think you need my ack for the bits that I maintain (though you
> > have it if you want it). However, this looks like it is going to be
> > /painful/. First of all it will touch a huge number of files in the
> > tree. Yes the change is trivial, but it will require manual fixups on
> > a lot of patches.
> 
> Yeah, this is dopey.  Send the script to Linus and ask him to run it
> seven seconds before he releases -rc1, when everyone's trees are
> empty(ish).  Or send him a single megapatch at that time.
> 

I like the script idea for removing all the __dev markings.  Creating
the patches in the first place was a game of whack-a-mole as various
trees changed.

-- 
Bill

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/9] vfs: Handle O_SYNC AIO DIO in generic code properly

2012-11-21 Thread Jeff Moyer

Christoph Hellwig  writes:

> On Wed, Nov 21, 2012 at 11:58:05AM -0500, Jeff Moyer wrote:
>> > I'd like to use this as a vehicle to revisit how dio completions work.
>> 
>> I don't like the sound of that.  ;-)  It sounds like this bugfix may get
>> further delayed by the desire for unrelated code cleanup.
>
> I've got a prototype that isn't much more invasive than the current
> series.  I'll finish it up and run it through QA and will post it
> tomorrow.

Works for me.  Thanks!

-Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH linux-next] firmware: Remove last vestiges of dabusb

2012-11-21 Thread Tim Gardner

dabusb was removed with commit dae86ccbc3c185aebfc396e8e668aa3d73d748d8
'[media] dabusb: remove obsolete driver', so remove the last vestiges of
firmware and documentation.

Cc: Rob Landley 
Cc: Paul Gortmaker 
Cc: Andrew Morton 
Cc: Ben Hutchings 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Signed-off-by: Tim Gardner 
---

This patch was created using '--irreversible-delete' which might require manual
intervention depending on your version of git.

 Documentation/devices.txt  |3 -
 firmware/Makefile  |1 -
 firmware/dabusb/bitstream.bin.ihex |  761 
 firmware/dabusb/firmware.HEX   |  649 --
 4 files changed, 1414 deletions(-)
 delete mode 100644 firmware/dabusb/bitstream.bin.ihex
 delete mode 100644 firmware/dabusb/firmware.HEX

diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index b6251cc..08f01e7 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -2561,9 +2561,6 @@ Your cooperation is appreciated.
192 = /dev/usb/yurex1   First USB Yurex device
   ...
209 = /dev/usb/yurex16  16th USB Yurex device
-   240 = /dev/usb/dabusb0  First daubusb device
-   ...
-   243 = /dev/usb/dabusb3  Fourth dabusb device
 
 180 block  USB block devices
  0 = /dev/uba  First USB block device
diff --git a/firmware/Makefile b/firmware/Makefile
index eeb1403..cbb09ce 100644
--- a/firmware/Makefile
+++ b/firmware/Makefile
@@ -97,7 +97,6 @@ fw-shipped-$(CONFIG_TEHUTI) += tehuti/bdx.bin
 fw-shipped-$(CONFIG_TIGON3) += tigon/tg3.bin tigon/tg3_tso.bin \
   tigon/tg3_tso5.bin
 fw-shipped-$(CONFIG_TYPHOON) += 3com/typhoon.bin
-fw-shipped-$(CONFIG_USB_DABUSB) += dabusb/firmware.fw dabusb/bitstream.bin
 fw-shipped-$(CONFIG_USB_EMI26) += emi26/loader.fw emi26/firmware.fw \
  emi26/bitstream.fw
 fw-shipped-$(CONFIG_USB_EMI62) += emi62/loader.fw emi62/bitstream.fw \
diff --git a/firmware/dabusb/bitstream.bin.ihex 
b/firmware/dabusb/bitstream.bin.ihex
deleted file mode 100644
index 5021a4b..000
diff --git a/firmware/dabusb/firmware.HEX b/firmware/dabusb/firmware.HEX
deleted file mode 100644
index 7c258df..000
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, smpboot: allow manual hotplug of CPUs

2012-11-21 Thread H. Peter Anvin

On 11/21/2012 10:35 AM, Sasha Levin wrote:
>>
>> Reading between the lines, this sounds like would cause a user-visible
>> difference between mptable platforms and ACPI platforms?  If so, that is
>> totally unacceptable.  If not, the description is confusing.
> 
> With ACPI platforms you don't need probe/release because the hardware notifies
> on CPU insert/eject - this doesn't exist on mptable which is why you have to
> do it manually with probe/release.
> 
> The difference is already user visible: you can hotplug on ACPI, but can't on
> mptables.
> 
> Yes, reading back the subject does sound confusing - a better one would 
> probably
> be "provide interface for CPU hotplug on mptable platforms" or something 
> similar.
> 

So, are there any mptables platforms which support hotplug?  If the
answer is "KVM" then the answer is that KVM needs to move to ACPI to get
the proper functionality; putting a hack in is really not okay.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] TPM: Issue TPM_STARTUP at driver load if the TPM has not been started

2012-11-21 Thread Jason Gunthorpe

The TPM will respond to TPM_GET_CAP with TPM_ERR_INVALID_POSTINIT if
TPM_STARTUP has not been issued. Detect this and automatically
issue TPM_STARTUP.

This is for embedded applications where the kernel is the first thing
to touch the TPM.

Signed-off-by: Jason Gunthorpe 
Tested-by: Peter Huewe 
Reviewed-by: Peter Huewe 
---
 drivers/char/tpm/tpm.c |   44 
 drivers/char/tpm/tpm.h |6 ++
 2 files changed, 46 insertions(+), 4 deletions(-)

v4 changes:
 - Use NULL
 - Put tpm_startup_header together with tpm_startup
 - Add Peter's -by lines.

diff --git a/drivers/char/tpm/tpm.c b/drivers/char/tpm/tpm.c
index 93211df..98d550e 100644
--- a/drivers/char/tpm/tpm.c
+++ b/drivers/char/tpm/tpm.c
@@ -468,7 +468,7 @@ static ssize_t transmit_cmd(struct tpm_chip *chip, struct 
tpm_cmd_t *cmd,
return -EFAULT;
 
err = be32_to_cpu(cmd->header.out.return_code);
-   if (err != 0)
+   if (err != 0 && desc)
dev_err(chip->dev, "A TPM error (%d) occurred %s\n", err, desc);
 
return err;
@@ -528,6 +528,25 @@ void tpm_gen_interrupt(struct tpm_chip *chip)
 }
 EXPORT_SYMBOL_GPL(tpm_gen_interrupt);
 
+#define TPM_ORD_STARTUP cpu_to_be32(153)
+#define TPM_ST_CLEAR cpu_to_be16(1)
+#define TPM_ST_STATE cpu_to_be16(2)
+#define TPM_ST_DEACTIVATED cpu_to_be16(3)
+static const struct tpm_input_header tpm_startup_header = {
+   .tag = TPM_TAG_RQU_COMMAND,
+   .length = cpu_to_be32(12),
+   .ordinal = TPM_ORD_STARTUP
+};
+
+static int tpm_startup(struct tpm_chip *chip, __be16 startup_type)
+{
+   struct tpm_cmd_t start_cmd;
+   start_cmd.header.in = tpm_startup_header;
+   start_cmd.params.startup_in.startup_type = startup_type;
+   return transmit_cmd(chip, _cmd, TPM_INTERNAL_RESULT_SIZE,
+   "attempting to start the TPM");
+}
+
 int tpm_get_timeouts(struct tpm_chip *chip)
 {
struct tpm_cmd_t tpm_cmd;
@@ -541,11 +560,28 @@ int tpm_get_timeouts(struct tpm_chip *chip)
tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP;
tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4);
tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT;
+   rc = transmit_cmd(chip, _cmd, TPM_INTERNAL_RESULT_SIZE, NULL);
 
-   rc = transmit_cmd(chip, _cmd, TPM_INTERNAL_RESULT_SIZE,
-   "attempting to determine the timeouts");
-   if (rc)
+   if (rc == TPM_ERR_INVALID_POSTINIT) {
+   /* The TPM is not started, we are the first to talk to it.
+  Execute a startup command. */
+   dev_info(chip->dev, "Issuing TPM_STARTUP");
+   if (tpm_startup(chip, TPM_ST_CLEAR))
+   return rc;
+
+   tpm_cmd.header.in = tpm_getcap_header;
+   tpm_cmd.params.getcap_in.cap = TPM_CAP_PROP;
+   tpm_cmd.params.getcap_in.subcap_size = cpu_to_be32(4);
+   tpm_cmd.params.getcap_in.subcap = TPM_CAP_PROP_TIS_TIMEOUT;
+   rc = transmit_cmd(chip, _cmd, TPM_INTERNAL_RESULT_SIZE,
+ NULL);
+   }
+   if (rc) {
+   dev_err(chip->dev,
+   "A TPM error (%d) occurred attempting to determine the 
timeouts\n",
+   rc);
goto duration;
+   }
 
if (be32_to_cpu(tpm_cmd.header.out.return_code) != 0 ||
be32_to_cpu(tpm_cmd.header.out.length)
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 8ef7649..8971b12 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -47,6 +47,7 @@ enum tpm_addr {
 #define TPM_WARN_DOING_SELFTEST 0x802
 #define TPM_ERR_DEACTIVATED 0x6
 #define TPM_ERR_DISABLED0x7
+#define TPM_ERR_INVALID_POSTINIT 38
 
 #define TPM_HEADER_SIZE10
 extern ssize_t tpm_show_pubek(struct device *, struct device_attribute *attr,
@@ -291,6 +292,10 @@ struct tpm_getrandom_in {
__be32 num_bytes;
 }__attribute__((packed));
 
+struct tpm_startup_in {
+   __be16  startup_type;
+} __packed;
+
 typedef union {
struct  tpm_getcap_params_out getcap_out;
struct  tpm_readpubek_params_out readpubek_out;
@@ -301,6 +306,7 @@ typedef union {
struct  tpm_pcrextend_in pcrextend_in;
struct  tpm_getrandom_in getrandom_in;
struct  tpm_getrandom_out getrandom_out;
+   struct tpm_startup_in startup_in;
 } tpm_cmd_params;
 
 struct tpm_cmd_t {
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, smpboot: allow manual hotplug of CPUs

2012-11-21 Thread Sasha Levin

On 11/21/2012 01:25 PM, H. Peter Anvin wrote:
> On 11/21/2012 10:22 AM, Sasha Levin wrote:
>> So far CPU hotplug was ignored for mptable implementations which support it 
>> by
>> having the hotpluggable CPUs marked as disabled during boot.
>>
>> The current kernel code detects that behaviour and actually deals with it
>> properly:
>>
>>  [0.00] Intel MultiProcessor Specification v1.4
>>  [0.00] MPTABLE: OEM ID: KVMCPU00
>>  [0.00] MPTABLE: Product ID: 0.1
>>  [0.00] MPTABLE: APIC at: 0xFEE0
>>  [0.00] Processor #0 (Bootup-CPU)
>>  [0.00] Processor #1
>>  [0.00] Processor #2
>>  [0.00] IOAPIC[0]: apic_id 4, version 17, address 0xfec0, 
>> GSI 0-23
>>  [0.00] Processors: 3
>>  [0.00] smpboot: Allowing 3 CPUs, 1 hotplug CPUs
>>
>> The problem begins when a user might actually want to online such CPU; there
>> is no interface for him to tell the kernel that the CPU is now present and
>> can be used.
>>
>> Luckily, the kernel provides a generic interface in the form of 'probe' and
>> 'release' sysfs files which are used on different architectures exactly for
>> that - to probe and release CPUs. On x86 however this was unimplemented
>> until now.
>>
>> This patch adds code into the x86 implementation of probe and release to 
>> allow
>> adding and removing CPUs. This allows machines that use mptable to hotplug
>> CPUs:
>>
> 
> Reading between the lines, this sounds like would cause a user-visible
> difference between mptable platforms and ACPI platforms?  If so, that is
> totally unacceptable.  If not, the description is confusing.

With ACPI platforms you don't need probe/release because the hardware notifies
on CPU insert/eject - this doesn't exist on mptable which is why you have to
do it manually with probe/release.

The difference is already user visible: you can hotplug on ACPI, but can't on
mptables.

Yes, reading back the subject does sound confusing - a better one would probably
be "provide interface for CPU hotplug on mptable platforms" or something 
similar.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] of: use platform_device_add

2012-11-21 Thread Greg Kroah-Hartman

On Wed, Nov 21, 2012 at 06:15:59PM +, Grant Likely wrote:
> This allows platform_device_add a chance to call insert_resource on all
> of the resources from OF. At a minimum this fills in proc/iomem and
> presumably makes resource tracking and conflict detection work better.
> However, it has the side effect of moving all OF generated platform
> devices from /sys/devices to /sys/devices/platform/. It /shouldn't/
> break userspace because userspace is not supposed to depend on the full
> path (because userspace always does what it is supposed to, right?).
> 
> It also has a backup call to of_device_add() when running on PowerPC to
> catch any devices that have overlapping regions. It will complain about
> them, but it will not fail to register the device.
> 
> Cc: Jason Gunthorpe 
> Cc: Benjamin Herrenschmidt 
> Cc: Rob Herring 
> Cc: Greg Kroah-Hartman 
> Signed-off-by: Grant Likely 
> ---
> 
> Greg, do you mind taking a look at this? The reason the OF code hasn't been
> calling platform_device_add() directly to this point is:
> a) there are some trees with resource overlays
> b) I want the devices in /sys/devices not /sys/devices/platform.

Putting the devices all in the "flat" location of /sys/devices/ is a bit
worrisome to me.  What's wrong with platform/ ?  That is what they are,
right?  Why change this?

> I could easily add exceptions to platform_device_add() for both those cases, 
> but
> I don't like adding DT exceptions to the common code. However, I still need to
> support the platforms that unfortunately have overlapping resources. This 
> patch
> does that by still calling the old path if platform_device_add() fails, but it
> isn't nice either because of_device_add() has to duplicate
> platform_device_add(). Blech. Plus the exception only applies for PowerPC.
> 
> So, how do you feel about having a 'relaxed' mode for platform_device_add()
> which means it won't fail if resources overlap and maybe won't do the silly
> platform_bus parent thing. Thoughts?

I have no objection for the resource issue, if you assure me it will not
be abused :)

But the sysfs location is still an issue, sorry.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 3/7] input: ti_am335x_tsc: Add variance filter

2012-11-21 Thread Dmitry Torokhov

Hi Rachna,

On Wed, Nov 07, 2012 at 12:22:00PM +0530, Patil, Rachna wrote:
> Only fine tuning variance parameter present in tslib
> utility does not help in removing all the ADC noise.
> This logic of filtering is necessary to get this
> touchscreen to work finely.

No, if filtering in tslib is not adequate please fix tslib so that your
work is usable for other devices as well.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] i2c: s3c2410: Get the i2c bus number from alias id

2012-11-21 Thread Doug Anderson

On Tue, Nov 20, 2012 at 8:09 PM, Mark Brown
 wrote:
> On Tue, Nov 20, 2012 at 02:27:04PM -0800, Doug Anderson wrote:
>> From: Padmavathi Venna 
>>
>> Get the i2c bus number that the device is connected to using the alias
>> id.  This makes debugging / grokking of kernel messages much easier.
>
> This doesn't look like a s3c2410 specific change - it's a generic device
> tree issue.  This suggests that it sohuld be implemented in the
> framework so that all I2C controllers with DT can use it.

Good suggestion.  I have posted a series with the title "Add automatic
bus number support for i2c busses with device tree".  It contains the
i2c-core patch as well as a patch removing similar code from the pxa
i2c driver.

Kukjin: please consider this patch abandoned and superseded by the new
i2c-core patch.  As Olof said, the patch for adding aliases for
exynos4 should still be fine to apply.

Thanks!

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 6/7] input: ti_am335x_tsc: Add DT support

2012-11-21 Thread Dmitry Torokhov

Hi Rachna,

On Wed, Nov 07, 2012 at 12:22:03PM +0530, Patil, Rachna wrote:
> Add DT support for client touchscreen driver
> 
> Signed-off-by: Patil, Rachna 
> ---
>  drivers/input/touchscreen/ti_am335x_tsc.c |   60 
> -
>  1 files changed, 50 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
> b/drivers/input/touchscreen/ti_am335x_tsc.c
> index 7a26810..c063cf6 100644
> --- a/drivers/input/touchscreen/ti_am335x_tsc.c
> +++ b/drivers/input/touchscreen/ti_am335x_tsc.c
> @@ -26,6 +26,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include 
>  
> @@ -398,12 +400,18 @@ static int __devinit titsc_probe(struct platform_device 
> *pdev)
>   struct titsc *ts_dev;
>   struct input_dev *input_dev;
>   struct ti_tscadc_dev *tscadc_dev = pdev->dev.platform_data;
> - struct mfd_tscadc_board *pdata;
> - int err;
> -
> - pdata = tscadc_dev->dev->platform_data;
> -
> - if (!pdata) {
> + int err, i;
> + struct mfd_tscadc_board *pdata = NULL;
> + struct device_node *node = NULL;
> + u32 val32, wires_conf[4];
> +
> + if (tscadc_dev->dev->of_node) {
> + node = tscadc_dev->dev->of_node;
> + node = of_find_node_by_name(node, "tsc");
> + } else
> + pdata = tscadc_dev->dev->platform_data;
> +
> + if (!pdata && !node) {
>   dev_err(>dev, "Could not find platform data\n");
>   return -EINVAL;
>   }
> @@ -421,11 +429,43 @@ static int __devinit titsc_probe(struct platform_device 
> *pdev)
>   ts_dev->mfd_tscadc = tscadc_dev;
>   ts_dev->input = input_dev;
>   ts_dev->irq = tscadc_dev->irq;
> - ts_dev->wires = pdata->tsc_init->wires;
> - ts_dev->x_plate_resistance = pdata->tsc_init->x_plate_resistance;
> - ts_dev->steps_to_configure = pdata->tsc_init->steps_to_configure;
> - memcpy(ts_dev->config_inp, pdata->tsc_init->wire_config,
> +
> + if (node) {
> + err = of_property_read_u32(node, "wires", );
> + if (err < 0)
> + goto err_free_mem;
> + else
> + ts_dev->wires = val32;
> +
> + err = of_property_read_u32(node, "x-plate-resistance", );
> + if (err < 0)
> + goto err_free_mem;
> + else
> + ts_dev->x_plate_resistance = val32;
> +
> + err = of_property_read_u32(node, "steps-to-configure", );
> + if (err < 0)
> + goto err_free_mem;
> + else
> + ts_dev->steps_to_configure = val32;
> +
> + err = of_property_read_u32_array(node, "wire-config",
> + wires_conf, ARRAY_SIZE(wires_conf));
> + if (err < 0)
> + goto err_free_mem;
> + else {
> + for (i = 0; i < ARRAY_SIZE(wires_conf); i++)
> + ts_dev->config_inp[i] = wires_conf[i];
> + }
> + } else {
> + ts_dev->wires = pdata->tsc_init->wires;
> + ts_dev->x_plate_resistance =
> + pdata->tsc_init->x_plate_resistance;
> + ts_dev->steps_to_configure =
> + pdata->tsc_init->steps_to_configure;
> + memcpy(ts_dev->config_inp, pdata->tsc_init->wire_config,
>   sizeof(pdata->tsc_init->wire_config));

I believe if platform data is supplied it should take precedence over DT
data so that parameters can be adjusted if really needed.

> + }
>  
>   err = request_irq(ts_dev->irq, titsc_irq,
> 0, pdev->dev.driver->name, ts_dev);
> -- 
> 1.7.0.4
> 
> 

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/9] vfs: Handle O_SYNC AIO DIO in generic code properly

2012-11-21 Thread Christoph Hellwig

On Wed, Nov 21, 2012 at 11:58:05AM -0500, Jeff Moyer wrote:
> > I'd like to use this as a vehicle to revisit how dio completions work.
> 
> I don't like the sound of that.  ;-)  It sounds like this bugfix may get
> further delayed by the desire for unrelated code cleanup.

I've got a prototype that isn't much more invasive than the current
series.  I'll finish it up and run it through QA and will post it
tomorrow.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] i2c: pxa: Use i2c-core to get bus number now

2012-11-21 Thread Doug Anderson

The commit: "i2c-core: dt: Pick i2c bus number from i2c alias if
present" adds support for automatically picking the bus number based
on the alias ID.  Remove the now unnecessary code from i2c-pxa that
did the same thing.

Signed-off-by: Doug Anderson 
---
 drivers/i2c/busses/i2c-pxa.c |8 +---
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/drivers/i2c/busses/i2c-pxa.c b/drivers/i2c/busses/i2c-pxa.c
index 1034d93..8ee9fa0 100644
--- a/drivers/i2c/busses/i2c-pxa.c
+++ b/drivers/i2c/busses/i2c-pxa.c
@@ -1053,16 +1053,10 @@ static int i2c_pxa_probe_dt(struct platform_device 
*pdev, struct pxa_i2c *i2c,
struct device_node *np = pdev->dev.of_node;
const struct of_device_id *of_id =
of_match_device(i2c_pxa_dt_ids, >dev);
-   int ret;
 
if (!of_id)
return 1;
-   ret = of_alias_get_id(np, "i2c");
-   if (ret < 0) {
-   dev_err(>dev, "failed to get alias id, errno %d\n", ret);
-   return ret;
-   }
-   pdev->id = ret;
+   pdev->id = -1;
if (of_get_property(np, "mrvl,i2c-polling", NULL))
i2c->use_pio = 1;
if (of_get_property(np, "mrvl,i2c-fast-mode", NULL))
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] i2c-core: dt: Pick i2c bus number from i2c alias if present

2012-11-21 Thread Doug Anderson

This allows you to get the equivalent functionality of
i2c_add_numbered_adapter() with all data in the device tree and no
special case code in your driver.  This is a common device tree
technique.

For quick reference, the FDT syntax for using an alias to provide an
ID looks like:
  aliases {
i2c0 = _0;
i2c1 = _1;
  };

Signed-off-by: Doug Anderson 
CC: Mark Brown 

---
 drivers/i2c/i2c-core.c |  105 +++-
 1 files changed, 77 insertions(+), 28 deletions(-)

diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index a7edf98..71deb2a2 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -915,13 +915,81 @@ out_list:
 }
 
 /**
+ * i2c_get_number_from_dt - get the adapter number based on dt alias
+ * @adap: the adapter to look at
+ *
+ * Check whether there's an alias in the FDT that gives an ID for this i2c
+ * device.  Use an alias like "i2c", like:
+ *   aliases {
+ * i2c0 = _0;
+ * i2c1 = _1;
+ *   };
+ *
+ * Returns the ID if found.  If no alias is found returns -1.
+ */
+static int i2c_get_number_from_dt(struct i2c_adapter *adap)
+{
+   struct device *dev = >dev;
+   int id;
+
+   if (!dev->of_node)
+   return -1;
+
+   id = of_alias_get_id(dev->of_node, "i2c");
+   if (id < 0)
+   return -1;
+   return id;
+}
+
+/**
+ * _i2c_add_numbered_adapter - i2c_add_numbered_adapter where nr is never -1
+ * @adap: the adapter to register (with adap->nr initialized)
+ * Context: can sleep
+ *
+ * See i2c_add_numbered_adapter() for details.
+ */
+static int _i2c_add_numbered_adapter(struct i2c_adapter *adap)
+{
+   int id;
+   int status;
+
+   /* Handled by wrappers */
+   BUG_ON(adap->nr == -1);
+
+   if (adap->nr & ~MAX_IDR_MASK)
+   return -EINVAL;
+
+retry:
+   if (idr_pre_get(_adapter_idr, GFP_KERNEL) == 0)
+   return -ENOMEM;
+
+   mutex_lock(_lock);
+   /* "above" here means "above or equal to", sigh;
+* we need the "equal to" result to force the result
+*/
+   status = idr_get_new_above(_adapter_idr, adap, adap->nr, );
+   if (status == 0 && id != adap->nr) {
+   status = -EBUSY;
+   idr_remove(_adapter_idr, id);
+   }
+   mutex_unlock(_lock);
+   if (status == -EAGAIN)
+   goto retry;
+
+   if (status == 0)
+   status = i2c_register_adapter(adap);
+   return status;
+}
+
+/**
  * i2c_add_adapter - declare i2c adapter, use dynamic bus number
  * @adapter: the adapter to add
  * Context: can sleep
  *
  * This routine is used to declare an I2C adapter when its bus number
- * doesn't matter.  Examples: for I2C adapters dynamically added by
- * USB links or PCI plugin cards.
+ * doesn't matter or when its bus number is specified by an dt alias.
+ * Examples of bases when the bus number doesn't matter: I2C adapters
+ * dynamically added by USB links or PCI plugin cards.
  *
  * When this returns zero, a new bus number was allocated and stored
  * in adap->nr, and the specified adapter became available for clients.
@@ -931,6 +999,12 @@ int i2c_add_adapter(struct i2c_adapter *adapter)
 {
int id, res = 0;
 
+   id = i2c_get_number_from_dt(adapter);
+   if (id >= 0) {
+   adapter->nr = id;
+   return _i2c_add_numbered_adapter(adapter);
+   }
+
 retry:
if (idr_pre_get(_adapter_idr, GFP_KERNEL) == 0)
return -ENOMEM;
@@ -977,34 +1051,9 @@ EXPORT_SYMBOL(i2c_add_adapter);
  */
 int i2c_add_numbered_adapter(struct i2c_adapter *adap)
 {
-   int id;
-   int status;
-
if (adap->nr == -1) /* -1 means dynamically assign bus id */
return i2c_add_adapter(adap);
-   if (adap->nr & ~MAX_IDR_MASK)
-   return -EINVAL;
-
-retry:
-   if (idr_pre_get(_adapter_idr, GFP_KERNEL) == 0)
-   return -ENOMEM;
-
-   mutex_lock(_lock);
-   /* "above" here means "above or equal to", sigh;
-* we need the "equal to" result to force the result
-*/
-   status = idr_get_new_above(_adapter_idr, adap, adap->nr, );
-   if (status == 0 && id != adap->nr) {
-   status = -EBUSY;
-   idr_remove(_adapter_idr, id);
-   }
-   mutex_unlock(_lock);
-   if (status == -EAGAIN)
-   goto retry;
-
-   if (status == 0)
-   status = i2c_register_adapter(adap);
-   return status;
+   return _i2c_add_numbered_adapter(adap);
 }
 EXPORT_SYMBOL_GPL(i2c_add_numbered_adapter);
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, smpboot: allow manual hotplug of CPUs

2012-11-21 Thread H. Peter Anvin

On 11/21/2012 10:22 AM, Sasha Levin wrote:
> So far CPU hotplug was ignored for mptable implementations which support it by
> having the hotpluggable CPUs marked as disabled during boot.
> 
> The current kernel code detects that behaviour and actually deals with it
> properly:
> 
>   [0.00] Intel MultiProcessor Specification v1.4
>   [0.00] MPTABLE: OEM ID: KVMCPU00
>   [0.00] MPTABLE: Product ID: 0.1
>   [0.00] MPTABLE: APIC at: 0xFEE0
>   [0.00] Processor #0 (Bootup-CPU)
>   [0.00] Processor #1
>   [0.00] Processor #2
>   [0.00] IOAPIC[0]: apic_id 4, version 17, address 0xfec0, 
> GSI 0-23
>   [0.00] Processors: 3
>   [0.00] smpboot: Allowing 3 CPUs, 1 hotplug CPUs
> 
> The problem begins when a user might actually want to online such CPU; there
> is no interface for him to tell the kernel that the CPU is now present and
> can be used.
> 
> Luckily, the kernel provides a generic interface in the form of 'probe' and
> 'release' sysfs files which are used on different architectures exactly for
> that - to probe and release CPUs. On x86 however this was unimplemented
> until now.
> 
> This patch adds code into the x86 implementation of probe and release to allow
> adding and removing CPUs. This allows machines that use mptable to hotplug
> CPUs:
> 

Reading between the lines, this sounds like would cause a user-visible
difference between mptable platforms and ACPI platforms?  If so, that is
totally unacceptable.  If not, the description is confusing.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 36/46] mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships

2012-11-21 Thread Ingo Molnar


* Mel Gorman  wrote:

> While it is desirable that all threads in a process run on its home
> node, this is not always possible or necessary. There may be more
> threads than exist within the node or the node might over-subscribed
> with unrelated processes.
> 
> This can cause a situation whereby a page gets migrated off its home
> node because the threads clearing pte_numa were running off-node. This
> patch uses page->last_nid to build a two-stage filter before pages get
> migrated to avoid problems with short or unlikely task<->node
> relationships.
> 
> Signed-off-by: Mel Gorman 
> ---
>  mm/mempolicy.c |   30 +-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4c1c8d8..fd20e28 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2317,9 +2317,37 @@ int mpol_misplaced(struct page *page, struct 
> vm_area_struct *vma, unsigned long
>   }
>  
>   /* Migrate the page towards the node whose CPU is referencing it */
> - if (pol->flags & MPOL_F_MORON)
> + if (pol->flags & MPOL_F_MORON) {
> + int last_nid;
> +
>   polnid = numa_node_id();
>  
> + /*
> +  * Multi-stage node selection is used in conjunction
> +  * with a periodic migration fault to build a temporal
> +  * task<->page relation. By using a two-stage filter we
> +  * remove short/unlikely relations.
> +  *
> +  * Using P(p) ~ n_p / n_t as per frequentist
> +  * probability, we can equate a task's usage of a
> +  * particular page (n_p) per total usage of this
> +  * page (n_t) (in a given time-span) to a probability.
> +  *
> +  * Our periodic faults will sample this probability and
> +  * getting the same result twice in a row, given these
> +  * samples are fully independent, is then given by
> +  * P(n)^2, provided our sample period is sufficiently
> +  * short compared to the usage pattern.
> +  *
> +  * This quadric squishes small probabilities, making
> +  * it less likely we act on an unlikely task<->page
> +  * relation.
> +  */
> + last_nid = page_xchg_last_nid(page, polnid);
> + if (last_nid != polnid)
> + goto out;
> + }
> +
>   if (curnid != polnid)
>   ret = polnid;
>  out:

As mentioned in my other mail, this patch of yours looks very 
similar to the numa/core commit attached below, mostly written 
by Peter:

  30f93abc6cb3 sched, numa, mm: Add the scanning page fault machinery

Thanks,

Ingo

->
>From 30f93abc6cb3fd387a134d6b94ff5ac396be1c88 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra 
Date: Tue, 13 Nov 2012 12:58:32 +0100
Subject: [PATCH] sched, numa, mm: Add the scanning page fault machinery

Add the NUMA working set scanning/hinting page fault machinery,
with no policy yet.

[ The earliest versions had the mpol_misplaced() function from
  Lee Schermerhorn - this was heavily modified later on. ]

Also-written-by: Lee Schermerhorn 
Signed-off-by: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Andrea Arcangeli 
Cc: Rik van Riel 
Cc: Mel Gorman 
Cc: Thomas Gleixner 
Cc: Hugh Dickins 
[ split it out of the main policy patch - as suggested by Mel Gorman ]
Signed-off-by: Ingo Molnar 
---
 include/linux/init_task.h |   8 +++
 include/linux/mempolicy.h |   6 +-
 include/linux/mm_types.h  |   4 ++
 include/linux/sched.h |  41 --
 init/Kconfig  |  73 +++-
 kernel/sched/core.c   |  15 +
 kernel/sysctl.c   |  31 ++-
 mm/huge_memory.c  |   1 +
 mm/mempolicy.c| 137 ++
 9 files changed, 294 insertions(+), 22 deletions(-)

[...]

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d04a8a5..318043a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2175,6 +2175,143 @@ static void sp_free(struct sp_node *n)
kmem_cache_free(sn_cache, n);
 }
 
+/*
+ * Multi-stage node selection is used in conjunction with a periodic
+ * migration fault to build a temporal task<->page relation. By
+ * using a two-stage filter we remove short/unlikely relations.
+ *
+ * Using P(p) ~ n_p / n_t as per frequentist probability, we can
+ * equate a task's usage of a particular page (n_p) per total usage
+ * of this page (n_t) (in a given time-span) to a probability.
+ *
+ * Our periodic faults will then sample this probability and getting
+ * the same result twice in a row, given these samples are fully
+ * independent, is then given by P(n)^2, provided our sample period
+ * is sufficiently short compared to the usage pattern.
+ *
+ * This quadric squishes small probabilities, making it less likely
+ * we act on an unlikely task<->page relation.
+

[PATCH] of: use platform_device_add

2012-11-21 Thread Grant Likely

This allows platform_device_add a chance to call insert_resource on all
of the resources from OF. At a minimum this fills in proc/iomem and
presumably makes resource tracking and conflict detection work better.
However, it has the side effect of moving all OF generated platform
devices from /sys/devices to /sys/devices/platform/. It /shouldn't/
break userspace because userspace is not supposed to depend on the full
path (because userspace always does what it is supposed to, right?).

It also has a backup call to of_device_add() when running on PowerPC to
catch any devices that have overlapping regions. It will complain about
them, but it will not fail to register the device.

Cc: Jason Gunthorpe 
Cc: Benjamin Herrenschmidt 
Cc: Rob Herring 
Cc: Greg Kroah-Hartman 
Signed-off-by: Grant Likely 
---

Greg, do you mind taking a look at this? The reason the OF code hasn't been
calling platform_device_add() directly to this point is:
a) there are some trees with resource overlays
b) I want the devices in /sys/devices not /sys/devices/platform.

I could easily add exceptions to platform_device_add() for both those cases, but
I don't like adding DT exceptions to the common code. However, I still need to
support the platforms that unfortunately have overlapping resources. This patch
does that by still calling the old path if platform_device_add() fails, but it
isn't nice either because of_device_add() has to duplicate
platform_device_add(). Blech. Plus the exception only applies for PowerPC.

So, how do you feel about having a 'relaxed' mode for platform_device_add()
which means it won't fail if resources overlap and maybe won't do the silly
platform_bus parent thing. Thoughts?

g.

 drivers/of/platform.c |   28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index b80891b..3d7ba40 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -203,6 +203,7 @@ struct platform_device *of_platform_device_create_pdata(
struct device *parent)
 {
struct platform_device *dev;
+   int rc;
 
if (!of_device_is_available(np))
return NULL;
@@ -214,16 +215,39 @@ struct platform_device *of_platform_device_create_pdata(
 #if defined(CONFIG_MICROBLAZE)
dev->archdata.dma_mask = 0xUL;
 #endif
+   dev->name = dev_name(>dev);
dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
-   dev->dev.bus = _bus_type;
dev->dev.platform_data = platform_data;
+   dev->dev.id = PLATFORM_DEVID_NONE;
+   /* device_add will assume that this device is on the same node as
+* the parent. If there is no parent defined, set the node
+* explicitly */
+   if (!parent)
+   set_dev_node(>dev, of_node_to_nid(np));
 
/* We do not fill the DMA ops for platform devices by default.
 * This is currently the responsibility of the platform code
 * to do such, possibly using a device notifier
 */
 
-   if (of_device_add(dev) != 0) {
+   rc = platform_device_add(dev);
+#ifdef CONFIG_POWERPC
+   /*
+* This POWERPC block isn't pretty, but the commit that adds it is a
+* little risky. There are possibly some powerpc platforms that have
+* overlapping resources in the device tree. If so, then I want to find
+* them, but I don't want to break support in the process. So, if
+* platform_device_add() fails, then register the device anyway, but
+* complain about it. Hopefully we can find and fix and problem
+* platforms before removing this code.
+*/
+   if (rc == -EBUSY) {
+   dev_warn(>dev, "WARNING: resource overlap in DT node %s\n",
+   np->full_name);
+   rc = of_device_add(dev);
+   }
+#endif
+   if (rc) {
platform_device_put(dev);
return NULL;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86, smpboot: allow manual hotplug of CPUs

2012-11-21 Thread Sasha Levin

So far CPU hotplug was ignored for mptable implementations which support it by
having the hotpluggable CPUs marked as disabled during boot.

The current kernel code detects that behaviour and actually deals with it
properly:

[0.00] Intel MultiProcessor Specification v1.4
[0.00] MPTABLE: OEM ID: KVMCPU00
[0.00] MPTABLE: Product ID: 0.1
[0.00] MPTABLE: APIC at: 0xFEE0
[0.00] Processor #0 (Bootup-CPU)
[0.00] Processor #1
[0.00] Processor #2
[0.00] IOAPIC[0]: apic_id 4, version 17, address 0xfec0, 
GSI 0-23
[0.00] Processors: 3
[0.00] smpboot: Allowing 3 CPUs, 1 hotplug CPUs

The problem begins when a user might actually want to online such CPU; there
is no interface for him to tell the kernel that the CPU is now present and
can be used.

Luckily, the kernel provides a generic interface in the form of 'probe' and
'release' sysfs files which are used on different architectures exactly for
that - to probe and release CPUs. On x86 however this was unimplemented
until now.

This patch adds code into the x86 implementation of probe and release to allow
adding and removing CPUs. This allows machines that use mptable to hotplug
CPUs:

sh-4.2# cd /sys/devices/system/cpu/
sh-4.2# cat possible present online
0-3
0-2
0-2
sh-4.2# echo "3 0x14" > probe
sh-4.2# cat possible present online
0-3
0-3
0-2
sh-4.2# echo 1 > cpu3/online
[   29.854133] smpboot: Booting Node 0 Processor 3 APIC 0x3
[0.001000] kvm-clock: cpu 3, msr 0:1bd929c1, secondary cpu clock
[   29.872438] KVM setup async PF for cpu 3
[   29.872790] kvm-stealtime: cpu 3, msr 1bd8d100
[   29.873276] microcode: CPU3 sig=0x206a7, pf=0x1, revision=0x1
sh-4.2# cat possible present online
0-3
0-3
0-3
sh-4.2# echo 0 > cpu3/online
[  116.146352] Unregister pv shared memory for cpu 3
[  116.16] Cannot set affinity for irq 0
[  116.163068] smpboot: CPU 3 is now offline
sh-4.2# cat possible present online
0-3
0-3
0-2
sh-4.2# echo 3 > release
sh-4.2# cat possible present online
0-3
0-2
0-2

Signed-off-by: Sasha Levin 
---
 arch/x86/kernel/smpboot.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 732bf5c..78b3197 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -97,8 +97,43 @@ void cpu_hotplug_driver_unlock(void)
mutex_unlock(_cpu_hotplug_driver_mutex);
 }
 
-ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
-ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
+ssize_t arch_cpu_probe(const char *buf, size_t count)
+{
+   int cpu, version, r;
+
+   r = sscanf(buf, "%d %x", , );
+   if (r != 2)
+   return -EINVAL;
+
+   if (!cpu_possible(cpu) || cpu_present(cpu))
+   return -EINVAL;
+
+   arch_register_cpu(cpu);
+   generic_processor_info(cpu, version);
+
+   return count;
+}
+
+ssize_t arch_cpu_release(const char *buf, size_t count)
+{
+   int cpu, r;
+
+   r = kstrtoint(buf, 10, );
+   if (r < 0)
+   return r;
+
+   if (!cpu_present(cpu))
+   return -EINVAL;
+
+   if (cpu_online(cpu))
+   return -EBUSY;
+
+   arch_unregister_cpu(cpu);
+   set_cpu_present(cpu, false);
+
+   return count;
+}
+
 #endif
 
 /* Number of siblings per CPU package */
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/46] Automatic NUMA Balancing V4

2012-11-21 Thread Ingo Molnar

* Mel Gorman  wrote:

> On Wed, Nov 21, 2012 at 06:33:16PM +0100, Ingo Molnar wrote:
> > 
> > * Mel Gorman  wrote:
> > 
> > > On Wed, Nov 21, 2012 at 06:03:06PM +0100, Ingo Molnar wrote:
> > > > 
> > > > * Mel Gorman  wrote:
> > > > 
> > > > > On Wed, Nov 21, 2012 at 10:21:06AM +, Mel Gorman wrote:
> > > > > > 
> > > > > > I am not including a benchmark report in this but will be posting 
> > > > > > one
> > > > > > shortly in the "Latest numa/core release, v16" thread along with 
> > > > > > the latest
> > > > > > schednuma figures I have available.
> > > > > > 
> > > > > 
> > > > > Report is linked here https://lkml.org/lkml/2012/11/21/202
> > > > > 
> > > > > I ended up cancelling the remaining tests and restarted with
> > > > > 
> > > > > 1. schednuma + patches posted since so that works out as
> > > > 
> > > > Mel, I'd like to ask you to refer to our tree as numa/core or 
> > > > 'numacore' in the future. Would such a courtesy to use the 
> > > > current name of our tree be possible?
> > > > 
> > > 
> > > Sure, no problem.
> > 
> > Thanks!
> > 
> > I ran a quick test with your 'balancenuma v4' tree and while 
> > numa02 and numa01-THREAD-ALLOC performance is looking good, 
> > numa01 performance does not look very good:
> > 
> > mainlinenuma/core  balancenuma-v4
> >  numa01:   340.3   139.4  276 secs
> > 
> > 97% slower than numa/core.
> > 
> 
> It would be. numa01 is an adverse workload where all threads 
> are hammering the same memory.  The two-stage filter in 
> balancenuma restricts the amount of migration it does so it 
> ends up in a situation where it cannot balance properly. [...]

Do you mean this "balancenuma v4" patch attributed to you:

 Subject: mm: Numa: Use a two-stage filter to restrict pages being migrated for 
unlikely task<->node relationships
 From: Mel Gorman 
 Date: Wed, 21 Nov 2012 10:21:42 +

 ...

 Signed-off-by: Mel Gorman 

which has:

/*
 * Multi-stage node selection is used in conjunction
 * with a periodic migration fault to build a temporal
 * task<->page relation. By using a two-stage filter we
 * remove short/unlikely relations.
 *
 * Using P(p) ~ n_p / n_t as per frequentist
 * probability, we can equate a task's usage of a
 * particular page (n_p) per total usage of this
 * page (n_t) (in a given time-span) to a probability.
 *
 * Our periodic faults will sample this probability and
 * getting the same result twice in a row, given these
 * samples are fully independent, is then given by
 * P(n)^2, provided our sample period is sufficiently
 * short compared to the usage pattern.
 *
 * This quadric squishes small probabilities, making
 * it less likely we act on an unlikely task<->page
 * relation.

This looks very similar to the code and text that Peter wrote 
for numa/core:

/*
 * Multi-stage node selection is used in conjunction with a periodic
 * migration fault to build a temporal task<->page relation. By
 * using a two-stage filter we remove short/unlikely relations.
 *
 * Using P(p) ~ n_p / n_t as per frequentist probability, we can
 * equate a task's usage of a particular page (n_p) per total usage
 * of this page (n_t) (in a given time-span) to a probability.
 *
 * Our periodic faults will then sample this probability and getting
 * the same result twice in a row, given these samples are fully
 * independent, is then given by P(n)^2, provided our sample period
 * is sufficiently short compared to the usage pattern.
 *
 * This quadric squishes small probabilities, making it less likely
 * we act on an unlikely task<->page relation.
 *
 * Return the best node ID this page should be on, or -1 if it should
 * stay where it is.
 */

see commit:

  30f93abc6cb3 sched, numa, mm: Add the scanning page fault machinery

?

I think it's the very same concept - yours is taken from an 
older sched/numa commit and attributed to yourself? [If so then 
please fix the attribution.]

We have the same filter in numa/core - because we wrote it (FYI, 
I wrote bits of the last_cpu variant in numa/core), yet our 
numa01 performance is much better than the one of balancenuma.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] regulator: add a regulator driver for the AS3711 PMIC

2012-11-21 Thread Guennadi Liakhovetski

This driver supports the 4 DCDC and 8 LDO regulators on the AS3711 PMIC.

Signed-off-by: Guennadi Liakhovetski 
---
 drivers/regulator/as3711-regulator.c |  381 ++
 1 files changed, 381 insertions(+), 0 deletions(-)
 create mode 100644 drivers/regulator/as3711-regulator.c

diff --git a/drivers/regulator/as3711-regulator.c 
b/drivers/regulator/as3711-regulator.c
new file mode 100644
index 000..d097572
--- /dev/null
+++ b/drivers/regulator/as3711-regulator.c
@@ -0,0 +1,381 @@
+/*
+ * AS3711 PMIC regulator driver, using DCDC Step Down and LDO supplies
+ *
+ * Copyright (C) 2012 Renesas Electronics Corporation
+ * Author: Guennadi Liakhovetski, 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License as
+ * published by the Free Software Foundation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct as3711_regulator_info {
+   struct regulator_desc   desc;
+   unsigned intmax_uV;
+};
+
+struct as3711_regulator {
+   struct as3711_regulator_info *reg_info;
+   struct regulator_dev *rdev;
+};
+
+static int as3711_list_voltage_sd(struct regulator_dev *rdev,
+ unsigned int selector)
+{
+   if (selector >= rdev->desc->n_voltages)
+   return -EINVAL;
+
+   if (!selector)
+   return 0;
+   if (selector < 0x41)
+   return 60 + selector * 12500;
+   if (selector < 0x71)
+   return 140 + (selector - 0x40) * 25000;
+   return 260 + (selector - 0x70) * 5;
+}
+
+static int as3711_list_voltage_aldo(struct regulator_dev *rdev,
+   unsigned int selector)
+{
+   if (selector >= rdev->desc->n_voltages)
+   return -EINVAL;
+
+   if (selector < 0x10)
+   return 120 + selector * 5;
+   return 180 + (selector - 0x10) * 10;
+}
+
+static int as3711_list_voltage_dldo(struct regulator_dev *rdev,
+   unsigned int selector)
+{
+   if (selector >= rdev->desc->n_voltages ||
+   (selector > 0x10 && selector < 0x20))
+   return -EINVAL;
+
+   if (selector < 0x11)
+   return 90 + selector * 5;
+   return 175 + (selector - 0x20) * 5;
+}
+
+static int as3711_bound_check(struct regulator_dev *rdev,
+ int *min_uV, int *max_uV)
+{
+   struct as3711_regulator_info *info = container_of(rdev->desc,
+   struct as3711_regulator_info, desc);
+   struct as3711_regulator *reg = rdev->reg_data;
+
+   WARN_ON(reg->reg_info != info);
+
+   dev_dbg(>dev, "%s(), %d, %d, %d\n", __func__,
+   *min_uV, rdev->desc->min_uV, info->max_uV);
+
+   if (*max_uV < *min_uV ||
+   *min_uV >= info->max_uV || rdev->desc->min_uV >= *max_uV)
+   return -EINVAL;
+
+   if (rdev->desc->n_voltages == 1)
+   return 0;
+
+   if (*max_uV > info->max_uV)
+   *max_uV = info->max_uV;
+
+   if (*min_uV < rdev->desc->min_uV)
+   *min_uV = rdev->desc->min_uV;
+
+   return *min_uV;
+}
+
+static int as3711_sel_check(int min, int max, int bottom, int step)
+{
+   int ret, voltage;
+
+   /* Round up min, when dividing: keeps us within the range */
+   ret = (min - bottom + step - 1) / step;
+   voltage = ret * step + bottom;
+   pr_debug("%s(): select %d..%d in %d+N*%d: %d\n", __func__,
+  min, max, bottom, step, ret);
+   if (voltage > max) {
+   /*
+* Try 1 down. It will take us below min, but as long we stay
+* above bottom, we're fine.
+*/
+   ret--;
+   voltage = ret * step + bottom;
+   if (voltage < bottom)
+   return -EINVAL;
+   }
+   return ret;
+}
+
+static int as3711_map_voltage_sd(struct regulator_dev *rdev,
+int min_uV, int max_uV)
+{
+   int ret;
+
+   ret = as3711_bound_check(rdev, _uV, _uV);
+   if (ret <= 0)
+   return ret;
+
+   if (min_uV <= 140)
+   return as3711_sel_check(min_uV, max_uV, 60, 12500);
+
+   if (min_uV <= 260)
+   return as3711_sel_check(min_uV, max_uV, 140, 25000) + 0x40;
+
+   return as3711_sel_check(min_uV, max_uV, 260, 5) + 0x70;
+}
+
+/*
+ * The regulator API supports 4 modes of operataion: FAST, NORMAL, IDLE and
+ * STANDBY. We map them in the following way to AS3711 SD1-4 DCDC modes:
+ * FAST:   sdX_fast=1
+ * NORMAL: low_noise=1
+ * IDLE:   low_noise=0
+ */
+
+static int as3711_set_mode_sd(struct regulator_dev *rdev, unsigned int mode)
+{
+   unsigned int fast_bit = rdev->desc->enable_mask,
+

[PATCH 1/2] mfd: add an AS3711 PMIC MFD driver

2012-11-21 Thread Guennadi Liakhovetski

AS3711 is a PMIC with multiple DCDC and LDO power supplies, GPIOs, an RTC,
a battery charger and a general purpose ADC. This patch adds support for
the MFD with support for a regulator driver and a backlight driver.

Signed-off-by: Guennadi Liakhovetski 
---

An as3711 backlight driver is still under development, is expected soon.

 drivers/mfd/as3711.c   |  202 
 include/linux/mfd/as3711.h |  126 +++
 2 files changed, 328 insertions(+), 0 deletions(-)
 create mode 100644 drivers/mfd/as3711.c
 create mode 100644 include/linux/mfd/as3711.h

diff --git a/drivers/mfd/as3711.c b/drivers/mfd/as3711.c
new file mode 100644
index 000..649020a
--- /dev/null
+++ b/drivers/mfd/as3711.c
@@ -0,0 +1,202 @@
+/*
+ * AS3711 PMIC MFC driver
+ *
+ * Copyright (C) 2012 Renesas Electronics Corporation
+ * Author: Guennadi Liakhovetski, 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License as
+ * published by the Free Software Foundation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum {
+   AS3711_REGULATOR,
+   AS3711_BACKLIGHT,
+};
+
+static struct mfd_cell as3711_subdevs[] = {
+   [AS3711_REGULATOR] = {.name = "as3711-regulator",},
+   [AS3711_BACKLIGHT] = {.name = "as3711-backlight",},
+};
+
+static bool as3711_volatile_reg(struct device *dev, unsigned int reg)
+{
+   switch (reg) {
+   case AS3711_GPIO_SIGNAL_IN:
+   case AS3711_INTERRUPT_STATUS_1:
+   case AS3711_INTERRUPT_STATUS_2:
+   case AS3711_INTERRUPT_STATUS_3:
+   case AS3711_CHARGER_STATUS_1:
+   case AS3711_CHARGER_STATUS_2:
+   case AS3711_REG_STATUS:
+   return true;
+   }
+   return false;
+}
+
+static bool as3711_precious_reg(struct device *dev, unsigned int reg)
+{
+   switch (reg) {
+   case AS3711_INTERRUPT_STATUS_1:
+   case AS3711_INTERRUPT_STATUS_2:
+   case AS3711_INTERRUPT_STATUS_3:
+   return true;
+   }
+   return false;
+}
+
+static bool as3711_readable_reg(struct device *dev, unsigned int reg)
+{
+   switch (reg) {
+   case AS3711_SD_1_VOLTAGE:
+   case AS3711_SD_2_VOLTAGE:
+   case AS3711_SD_3_VOLTAGE:
+   case AS3711_SD_4_VOLTAGE:
+   case AS3711_LDO_1_VOLTAGE:
+   case AS3711_LDO_2_VOLTAGE:
+   case AS3711_LDO_3_VOLTAGE:
+   case AS3711_LDO_4_VOLTAGE:
+   case AS3711_LDO_5_VOLTAGE:
+   case AS3711_LDO_6_VOLTAGE:
+   case AS3711_LDO_7_VOLTAGE:
+   case AS3711_LDO_8_VOLTAGE:
+   case AS3711_SD_CONTROL:
+   case AS3711_GPIO_SIGNAL_OUT:
+   case AS3711_GPIO_SIGNAL_IN:
+   case AS3711_SD_CONTROL_1:
+   case AS3711_SD_CONTROL_2:
+   case AS3711_CURR_CONTROL:
+   case AS3711_CURR1_VALUE:
+   case AS3711_CURR2_VALUE:
+   case AS3711_CURR3_VALUE:
+   case AS3711_STEPUP_CONTROL_1:
+   case AS3711_STEPUP_CONTROL_2:
+   case AS3711_STEPUP_CONTROL_4:
+   case AS3711_STEPUP_CONTROL_5:
+   case AS3711_REG_STATUS:
+   case AS3711_INTERRUPT_STATUS_1:
+   case AS3711_INTERRUPT_STATUS_2:
+   case AS3711_INTERRUPT_STATUS_3:
+   case AS3711_CHARGER_STATUS_1:
+   case AS3711_CHARGER_STATUS_2:
+   case AS3711_ASIC_ID_1:
+   case AS3711_ASIC_ID_2:
+   return true;
+   }
+   return false;
+}
+
+static const struct regmap_config as3711_regmap_config = {
+   .reg_bits = 8,
+   .val_bits = 8,
+   .volatile_reg = as3711_volatile_reg,
+   .readable_reg = as3711_readable_reg,
+   .precious_reg = as3711_precious_reg,
+   .max_register = AS3711_MAX_REGS,
+   .num_reg_defaults_raw = AS3711_MAX_REGS,
+   .cache_type = REGCACHE_RBTREE,
+};
+
+static int as3711_i2c_probe(struct i2c_client *client,
+   const struct i2c_device_id *id)
+{
+   struct as3711 *as3711;
+   struct as3711_platform_data *pdata = client->dev.platform_data;
+   unsigned int id1, id2;
+   int ret;
+
+   if (!pdata) {
+   dev_err(>dev, "Platform data not found\n");
+   return -ENODEV;
+   }
+
+   as3711 = devm_kzalloc(>dev, sizeof(struct as3711), GFP_KERNEL);
+   if (!as3711) {
+   dev_err(>dev, "Memory allocation failed\n");
+   return -ENOMEM;
+   }
+
+   as3711->dev = >dev;
+   i2c_set_clientdata(client, as3711);
+
+   if (client->irq)
+   dev_notice(>dev, "IRQ not supported yet\n");
+
+   as3711->regmap = devm_regmap_init_i2c(client, _regmap_config);
+   if (IS_ERR(as3711->regmap)) {
+   ret = PTR_ERR(as3711->regmap);
+   dev_err(>dev, "regmap initialization failed: %d\n", 
ret);
+   return ret;
+   }
+
+   regmap_read(as3711->regmap, AS3711_ASIC_ID_1, );
+   regmap_read(as3711->regmap,

[PATCH] brcmsmac: Add __printf verification to logging prototypes

2012-11-21 Thread Joe Perches

Adding __printf helps spot format and argument mismatches.

Signed-off-by: Joe Perches 
---
 drivers/net/wireless/brcm80211/brcmsmac/debug.h |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/brcm80211/brcmsmac/debug.h 
b/drivers/net/wireless/brcm80211/brcmsmac/debug.h
index c0d2cf7..f77066b 100644
--- a/drivers/net/wireless/brcm80211/brcmsmac/debug.h
+++ b/drivers/net/wireless/brcm80211/brcmsmac/debug.h
@@ -8,17 +8,23 @@
 #include "main.h"
 #include "mac80211_if.h"
 
+__printf(2, 3)
 void __brcms_info(struct device *dev, const char *fmt, ...);
+__printf(2, 3)
 void __brcms_warn(struct device *dev, const char *fmt, ...);
+__printf(2, 3)
 void __brcms_err(struct device *dev, const char *fmt, ...);
+__printf(2, 3)
 void __brcms_crit(struct device *dev, const char *fmt, ...);
 
 #if defined(CONFIG_BRCMDBG) || defined(CONFIG_BRCM_TRACING)
+__printf(4, 5)
 void __brcms_dbg(struct device *dev, u32 level, const char *func,
 const char *fmt, ...);
 #else
-static inline void __brcms_dbg(struct device *dev, u32 level,
-  const char *func, const char *fmt, ...)
+static inline __printf(4, 5)
+void __brcms_dbg(struct device *dev, u32 level, const char *func,
+const char *fmt, ...)
 {
 }
 #endif
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] of: Have of_device_add call platform_device_add rather than device_add

2012-11-21 Thread Jason Gunthorpe

On Wed, Nov 21, 2012 at 06:07:46PM +, Grant Likely wrote:

> > Which is nesting the generic gpio driver under a larger region..
> 
> Try two sibling nodes with overlapping addresses. There are powerpc
> device trees doing that even though it isn't legal by the ofw and
> epapr specs.

Both my examples were using sibling nodes in the OF tree.

pex@e000 {
device_type = "pci";
ranges = <0x0200 0x 0x  0xe000  0x0 
0x800>;
bus-range = <0x0 0xFF>;
chip@0 {
ranges = <0x0200 0x 0x  0x0200 
0x 0x  0x0 0x800>;
chip_control@0 {
compatible = "orc,chip,control";
assigned-addresses = <0x0200 0x0 0x0  0x0 
4096>;
};

gpio3: chip_gpio@8 {
#gpio-cells = <2>;
compatible = "linux,basic-mmio-gpio";
gpio-controller;
reg-names = "dat", "set", "dirin";
assigned-addresses = <0x0200 0x0 0x8  0x0 
4>,
 <0x0200 0x0 0xc  0x0 
4>,
 <0x0200 0x0 0x10  0x0 
4>;
};

Non-conformant yes, but it is the simplest way to get linux to bind
two drivers to the same memory space.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] of: Have of_device_add call platform_device_add rather than device_add

2012-11-21 Thread Grant Likely

On Wed, Nov 21, 2012 at 5:44 PM, Jason Gunthorpe
 wrote:
> On Wed, Nov 21, 2012 at 03:51:04PM +, Grant Likely wrote:
>> On Wed, 21 Nov 2012 00:24:48 -0700, Jason Gunthorpe 
>>  wrote:
>> > This allows platform_device_add a chance to call insert_resource
>> > on all of the resources from OF. At a minimum this fills in proc/iomem
>> > and presumably makes resource tracking and conflict detection work
>> > better.
>> >
>> > Signed-off-by: Jason Gunthorpe 
>> >  drivers/of/device.c |2 +-
>> >  1 files changed, 1 insertions(+), 1 deletions(-)
>> >
>> > Tested on PPC32 and ARM32 embedded kernels.
>> >
>> > diff --git a/drivers/of/device.c b/drivers/of/device.c
>> > index 4c74e4f..a5b67dc 100644
>> > +++ b/drivers/of/device.c
>> > @@ -62,7 +62,7 @@ int of_device_add(struct platform_device *ofdev)
>> > if (!ofdev->dev.parent)
>> > set_dev_node(>dev, of_node_to_nid(ofdev->dev.of_node));
>> >
>> > -   return device_add(>dev);
>> > +   return platform_device_add(ofdev);
>> >  }
>> >
>> >  int of_device_register(struct platform_device *pdev)
>>
>> This has the side effect of moving all devices at the root of the tree
>> from /sys/devices/ to /sys/devices/platform. It also has the possibility
>> of breaking if any devices get registered with overlapping regions. I
>> think there are some powerpc 5200 boards that do this, and I'm not sure
>> about the larger Power boxen.
>
> Okay, I'll try to test your patch.
>
> I know sensible overlapping seems to work:
>
> e000-e7ff : PCIe 0 MEM
>   e000-e000 : :00:01.0
> e000-efff : /pex@e000/chip@0/chip_control@0
>   e008-e00b : dat
> e008-e00b : dat
>   e00c-e00f : set
> e00c-e00f : set
>   e010-e013 : dirin
> e010-e013 : dirin
>
> Which is nesting the generic gpio driver under a larger region..

Try two sibling nodes with overlapping addresses. There are powerpc
device trees doing that even though it isn't legal by the ofw and
epapr specs.

g.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Lockdep complain for zram

2012-11-21 Thread Nitin Gupta


On 11/21/2012 12:37 AM, Minchan Kim wrote:

Hi alls,

Today, I saw below complain of lockdep.
As a matter of fact, I knew it long time ago but forgot that.
The reason lockdep complains is that now zram uses GFP_KERNEL
in reclaim path(ex, __zram_make_request) :(
I can fix it via replacing GFP_KERNEL with GFP_NOIO.
But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
Of course, I can change it with __vmalloc which can receive gfp_t.
But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls
allocation of GFP_KERNEL. That's why I sent the patch.
https://lkml.org/lkml/2012/4/23/77
Since then, I forgot it, saw the bug today and poped the question again.

Yes. Fundamental problem is utter crap API vmalloc.
If we can fix it, everyone would be happy. But life isn't simple like seeing
my thread of the patch.

So next option is to move zram_init_device into setting disksize time.
But it makes unnecessary metadata waste until zram is used really(That's why
Nitin move zram_init_device from disksize setting time to make_request) and
it makes user should set the disksize before using, which are behavior change.

I would like to clean up this issue before promoting because it might change
usage behavior.

Do you have any idea?



Maybe we can alloc_vm_area() right on device creation in create_device() 
assuming the default disksize. If user explicitly sets the disksize, 
this vm area is deallocated and a new one is allocated based on the new 
disksize.  When the device is reset, we should only free physical pages 
allocated for the table and the virtual area should be set back as if 
disksize is set to the default.


At the device init time, all the pages can be allocated with GFP_NOIO | 
__GPF_HIGHMEM and since the VM area is preallocated, map_vm_area() will 
not hit any of those page-table allocations with GFP_KERNEL.


Other allocations made directly from zram, for instance in the partial 
I/O case, should also be changed to GFP_NOIO | __GFP_HIGHMEM.


Thanks,
Nitin


 8< ==


[  335.772277] =
[  335.772615] [ INFO: inconsistent lock state ]
[  335.772955] 3.7.0-rc6 #162 Tainted: G C
[  335.773320] -
[  335.773663] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
[  335.774170] kswapd0/23 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  335.774564]  (>init_lock){+-}, at: [] 
zram_make_request+0x4a/0x260 [zram]
[  335.775321] {RECLAIM_FS-ON-W} state was registered at:
[  335.775716]   [] mark_held_locks+0x82/0x130
[  335.776009]   [] lockdep_trace_alloc+0x67/0xc0
[  335.776009]   [] __alloc_pages_nodemask+0x94/0xa00
[  335.776009]   [] alloc_pages_current+0xb6/0x120
[  335.776009]   [] __get_free_pages+0x14/0x50
[  335.776009]   [] kmalloc_order_trace+0x3f/0xf0
[  335.776009]   [] zram_init_device+0x7b/0x220 [zram]
[  335.776009]   [] zram_make_request+0x24a/0x260 [zram]
[  335.776009]   [] generic_make_request+0xca/0x100
[  335.776009]   [] submit_bio+0x7b/0x160
[  335.776009]   [] submit_bh+0xf2/0x120
[  335.776009]   [] block_read_full_page+0x235/0x3a0
[  335.776009]   [] blkdev_readpage+0x18/0x20
[  335.776009]   [] __do_page_cache_readahead+0x2c7/0x2d0
[  335.776009]   [] force_page_cache_readahead+0x79/0xb0
[  335.776009]   [] page_cache_sync_readahead+0x43/0x50
[  335.776009]   [] generic_file_aio_read+0x4f0/0x760
[  335.776009]   [] blkdev_aio_read+0xbb/0xf0
[  335.776009]   [] do_sync_read+0xa3/0xe0
[  335.776009]   [] vfs_read+0xb0/0x180
[  335.776009]   [] sys_read+0x52/0xa0
[  335.776009]   [] system_call_fastpath+0x16/0x1b
[  335.776009] irq event stamp: 97589
[  335.776009] hardirqs last  enabled at (97589): [] 
throtl_update_dispatch_stats+0x94/0xf0
[  335.776009] hardirqs last disabled at (97588): [] 
throtl_update_dispatch_stats+0x4d/0xf0
[  335.776009] softirqs last  enabled at (67416): [] 
__do_softirq+0x139/0x280
[  335.776009] softirqs last disabled at (67395): [] 
call_softirq+0x1c/0x30
[  335.776009]
[  335.776009] other info that might help us debug this:
[  335.776009]  Possible unsafe locking scenario:
[  335.776009]
[  335.776009]CPU0
[  335.776009]
[  335.776009]   lock(>init_lock);
[  335.776009]   
[  335.776009] lock(>init_lock);
[  335.776009]
[  335.776009]  *** DEADLOCK ***
[  335.776009]
[  335.776009] no locks held by kswapd0/23.
[  335.776009]
[  335.776009] stack backtrace:
[  335.776009] Pid: 23, comm: kswapd0 Tainted: G C   3.7.0-rc6 #162
[  335.776009] Call Trace:
[  335.776009]  [] print_usage_bug+0x1f5/0x206
[  335.776009]  [] ? save_stack_trace+0x2f/0x50
[  335.776009]  [] mark_lock+0x295/0x2f0
[  335.776009]  [] ? 
print_irq_inversion_bug.part.37+0x1f0/0x1f0
[  335.776009]  [] ? blk_throtl_bio+0x88/0x630
[  335.776009]  [] __lock_acquire+0x564/0x1c00
[  335.776009]  [] ? trace_hardirqs_on_caller+0x105/0x190
[  335.776009]  [] ? blk_throtl_bio+0x3c2/0x630
[  335.776009]  [] ? blk_throtl_bio+0x88/0x630
[

Re: [PATCH 00/27] Latest numa/core release, v16

2012-11-21 Thread Ingo Molnar


* Linus Torvalds  wrote:

> [...] And not look at vsyscalls or anything, but look at what 
> schednuma does wrong!

I have started 4 independent lines of inquiry to figure out 
what's wrong on David's system, and all four are in the category 
of 'what does our tree do to cause a regression':

  - suboptimal (== regressive) 4K fault handling by numa/core

  - suboptimal (== regressive) placement by numa/core on David's 
assymetric-topology system

  - vsyscalls escallating numa/core page fault overhead
non-linearly

  - TLB flushes escallating numacore page fault overhead
non-linearly

I have sent patches for 3 of them, one is still work in 
progress, because it's non-trivial.

I'm absolutely open to every possibility and obviously any 
regression is numa/core's fault, full stop.

What would you have done differently to handle this particular 
regression?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/46] Automatic NUMA Balancing V4

2012-11-21 Thread Mel Gorman

On Wed, Nov 21, 2012 at 06:33:16PM +0100, Ingo Molnar wrote:
> 
> * Mel Gorman  wrote:
> 
> > On Wed, Nov 21, 2012 at 06:03:06PM +0100, Ingo Molnar wrote:
> > > 
> > > * Mel Gorman  wrote:
> > > 
> > > > On Wed, Nov 21, 2012 at 10:21:06AM +, Mel Gorman wrote:
> > > > > 
> > > > > I am not including a benchmark report in this but will be posting one
> > > > > shortly in the "Latest numa/core release, v16" thread along with the 
> > > > > latest
> > > > > schednuma figures I have available.
> > > > > 
> > > > 
> > > > Report is linked here https://lkml.org/lkml/2012/11/21/202
> > > > 
> > > > I ended up cancelling the remaining tests and restarted with
> > > > 
> > > > 1. schednuma + patches posted since so that works out as
> > > 
> > > Mel, I'd like to ask you to refer to our tree as numa/core or 
> > > 'numacore' in the future. Would such a courtesy to use the 
> > > current name of our tree be possible?
> > > 
> > 
> > Sure, no problem.
> 
> Thanks!
> 
> I ran a quick test with your 'balancenuma v4' tree and while 
> numa02 and numa01-THREAD-ALLOC performance is looking good, 
> numa01 performance does not look very good:
> 
> mainlinenuma/core  balancenuma-v4
>  numa01:   340.3   139.4  276 secs
> 
> 97% slower than numa/core.
> 

It would be. numa01 is an adverse workload where all threads are hammering
the same memory.  The two-stage filter in balancenuma restricts the amount
of migration it does so it ends up in a situation where it cannot balance
properly. It'll do some migration if the PTE updates happen fast enough but
that's about it.  It needs a proper policy on top to detect this situation
and interleave the memory between nodes to at least maximise the available
memory bandwidth. This would replace the two-stage filter which is there
to mitigate a ping-pong effect.

> I did a quick SPECjbb 32-warehouses run as well:
> 
> numa/core  balancenuma-v4
>   SPECjbb  +THP:   655 k/sec  607 k/sec
> 

Cool. Lets see what we have here. I have some questions;

You say you ran with 32 warehouses. Was this a single run with just 32
warehouses or you did a specjbb run up to 32 warehouses and use the figure
specjbb spits out? If it ran for multiple warehouses, how did each number
of warehouses do? I ask because sometimes we do worse for low numbers
of warehouses and better at high numbers, particularly around where the
workload peaks.

Was this a single JVM configuration?

What is the comparison with a baseline kernel?

You say you ran with balancenuma-v4. Was that the full series including
the broken placement policy or did you test with just patches 1-37 as I
asked in the patch leader?

> Here it's 7.9% slower.
> 

And in comparison to a vanilla kernel?

Bear in mind that my objective was to have a foundation that did noticably
better than mainline that a proper placement and scheduling policy could
be built on top of.

Thanks!

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 946 matches

Mail list logo