Re: Why is the deferred initcall patch not mainline?

2014-10-21 Thread Dirk Behme

On 21.10.2014 21:37, Bird, Tim wrote:

I'm going to respond to several comments in this one message (sorry for the 
likely confusion)

On Tuesday, October 21, 2014 9:31 AM, Nicolas Pitre [n...@fluxnic.net] wrote:


On Tue, 21 Oct 2014, Grant Likely wrote:


On Sat, Oct 18, 2014 at 9:11 AM, Bird, Tim tim.b...@sonymobile.com wrote:

The answer is pretty easy, I think.  I tried to mainline it once but failed, 
and didn't really
try again. If it is being found useful,  we should try to mainline it again,  
this time with
more persistence.  The reason it got rejected before IIRC was that you can 
accomplish
a similar thing with modules, with no changes to the kernel. But that doesn't 
cover
the case where the loadable modules feature of the kernel is turned off, which 
is
common in very small systems.


It is a rather clumsy approach though since it requires changes to
modules and it makes the configuration static per build. Could it
instead be done by the kernel accepting a list of initcalls that
should be deferred? It would depend I suppose on the cost of finding
the initcalls to defer at boot time.


Yeah, I'm not a big fan of having to change kernel code in order to
use the feature.  I am quite intrigued by Geert Uytterhoeven's idea
to add a 'D' option to the config system, so that the record of which
modules to defer could be stored there.  This is much better than
hand-altering code.  I don't know how difficult this would be to add
to the kbuild system, but the mechanism for altering the macro would
be, IMHO, very straightforward.

I should say that it's been quite some time since I worked on this,
so some of my recollections may be fuzzy.

With regards to doing it dynamically, I'd have to think about how
to do that.  Having text-based lists of things to do at runtime seems
to fit with how we're using device tree these days, but I'm not sure
how that would work.

The code as it stands now is quite simple, just creating a new linker section
to hold the list of deferred function pointers, re-using all existing
routines for processing such lists, doing a few code changes to handle
actually deferring the initialization and memory free-ing, and finally
creating a /proc entry to trigger the whole thing.

In a modern kernel, the /proc trigger should definitely be moved to
/sys.  Other than this, though, if you move to some other system of
processing the list, you will have to create new infrastructure for
working through the deferred module list, or make a change in the
way the items are handled in the generic init function pointer processing.
A simple solution would be to just compare each item from each ...initcall.init
section with a list of deferred functions, and not process them, until doing
the deferred init.

Note that the current technique uses the compiler and linker do some of
the work for list aggregation and processing, so that would have to be replaced
with something else if  you do it differently.



I missed the session unfortunately, are there some measurements
available that I could look at? Which subsystems are typically the
problem?


I, too, would like to know more about the problem.  Any pointers?


Here is the elinux wiki page with some historical measurements:
http://elinux.org/Deferred_Initcalls

The example on the wiki page defers 2 USB modules, and it
saved 530 milliseconds on an x86 system.

This is consistent with what we saw on cameras at Sony.
This patch predated Arjan Van de Ven's fastboot work.  I don't
know if some of his parallelization (asynchronous module loading), and
optimizations for USB loading made things substantially better than this.
The USB spec makes in impossible to avoid a certain amount of delay
in probing the USB busses

USB was the main culprit, but we sometimes deferred other modules, if they
were not in the fastpath for taking a picture. Sony cameras had a goal of
booting in .5 seconds, but I think the best we ever achieved was about 1.1
seconds, using deferred initcalls and a variety of other techniques.



To extend the list of usage examples, e.g.

-late_initcall(clk_debug_init);
+deferred_initcall(clk_debug_init);

I.e. you might want to have some debug features enabled, but you don't 
want to spend the time needed for initializing them in the time critical 
boot phase.


Best regards

Dirk

--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why is the deferred initcall patch not mainline?

2014-10-19 Thread Dirk Behme

On 18.10.2014 10:11, Bird, Tim wrote:

The answer is pretty easy, I think.  I tried to mainline it once but failed, 
and didn't really try again. If it is being found useful,  we should try to 
mainline it again,  this time with more persistence.  The reason it got 
rejected before IIRC was that you can accomplish a similar thing with modules, 
with no changes to the kernel. But that doesn't cover the case where the 
loadable modules feature of the kernel is turned off, which is common in very 
small systems.


Just some other uses cases: You want to avoid the overhead of ELF 
module loading, even if module loading is on. We've seen a lot of 
cases where the overall boot time is a lot faster having the driver in 
the kernel than loading it as module. Even if the kernel size and 
therefore its load time increases with this.


And if you want to have the driver quite early, earlier than the user 
space loads the modules. But want to have the delay/wait time of that 
driver to be running _after_ you have mounted the rootfs.


Thanks

Dirk

Btw.: Does anybody have the correct mail address of Chris? Maybe he 
has some opinions on this, too, as his talk is the starting point of 
this discussion ;)




 Dirk Behme wrote 

Hi,

During the ELCE 2014 in Duesseldorf in Chris Hallinan's talk [1] there
has been the unanswered question why the deferred initcall patch [2]
isn't mainline, yet.

Anybody remembers?

Best regards

Dirk


[1] http://sched.co/1yG5fmY

[2] http://elinux.org/Deferred_Initcalls
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Why is the deferred initcall patch not mainline?

2014-10-17 Thread Dirk Behme

Hi,

During the ELCE 2014 in Duesseldorf in Chris Hallinan's talk [1] there 
has been the unanswered question why the deferred initcall patch [2] 
isn't mainline, yet.


Anybody remembers?

Best regards

Dirk


[1] http://sched.co/1yG5fmY

[2] http://elinux.org/Deferred_Initcalls
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Boot time: Initial main memory initialization optimizations?

2014-05-05 Thread Dirk Behme

Hi,

regarding boot time optimization, on an embedded ARM Cortex-A9 based 
system with 512MB or 1GB main memory, we found that initializing this 
main memory takes a somehow large amount of time.


Initializing 512MB takes = ~100ms, the additional 512MB on the 1GB 
take = ~100ms additionally, too. So in sum = ~200ms for 1GB.


Having a short look to this, it looks like most of the time is spent 
in arch/arm/mm/init.c in 
bootmem_init()/arm_bootmem_init()/arm_bootmem_free().


Has anybody already looked into this if there are any optimizations 
possible? Maybe even some hacks, if the main memory size (512MB/1GB) 
is always known? Any pointers?


I'm looking for reducing (a) the overall init time and maybe (b) the 
dependency on the memory size.


Thanks

Dirk




--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Boot time: Optimize CPU bring up?

2013-06-06 Thread Dirk Behme

Hi,

on a ARMv7 Freescale i.MX6 based system we are looking at optimizing the 
kernel boot time. Booting a 3.5.7 kernel with SMP=y and the kernel 
option 'nosmp' (the i.MX6 has single, dual and quad CPU versions) we get



[0.255927] hw perfevents: enabled with ARMv7 Cortex-A9 PMU driver, 7 
counters available

[0.256033] Setting up static identity map for 0x10426a28 - 0x10426a80
[0.260204] initcall spawn_ksoftirqd+0x0/0x58 returned 0 after 9765 usecs
[0.270363] initcall init_workqueues+0x0/0x39c returned 0 after 9765 
usecs

[0.290265] initcall cpu_stop_init+0x0/0xd0 returned 0 after 19531 usecs
[0.310449] initcall rcu_spawn_kthreads+0x0/0xc0 returned 0 after 
19531 usecs

[0.310699] Brought up 1 CPUs
[0.310712] SMP: Total of 1 processors activated (1581.05 BogoMIPS).


I.e. ~55ms just for bringing up the 1 CPU.

Looking into some details, e.g. cpu_stop_init(), the ~19531 usecs are 
there because the system 'hangs' 2 jiffies (CONFIG_HZ=100) in 
cpu_v7_do_idle().


For testing purposes switching to CONFIG_HZ=1000 reduces above 54ms to 
just ~4ms. But we are unsure to switch the whole system to 
CONFIG_HZ=1000 just to optimize this part of the boot process.


Does anybody know why all the above parts are idling for some jiffies? 
Is there any other optimization than CONFIG_HZ=1000 possible?


In case there are any patches floating around or this was already 
discussed, any link would be nice.


Many thanks and best regards

Dirk
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] mmc: block: remove unused name_idx

2012-08-06 Thread Dirk Behme
With the previous patch mmc: block: mmcblkN: use slot index instead of
dynamic name index name_idx is not needed any more.

Signed-off-by: Dirk Behme dirk.be...@de.bosch.com
CC: Jassi Brar jaswinder.si...@linaro.org
CC: Chris Ball c...@laptop.org
---
 drivers/mmc/card/block.c |   16 
 1 files changed, 0 insertions(+), 16 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index a01d306..555d840 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -74,7 +74,6 @@ static int max_devices;
 
 /* 256 minors, so at most 256 separate devices */
 static DECLARE_BITMAP(dev_use, 256);
-static DECLARE_BITMAP(name_use, 256);
 
 /*
  * There is one mmc_blk_data per slot.
@@ -92,7 +91,6 @@ struct mmc_blk_data {
unsigned intusage;
unsigned intread_only;
unsigned intpart_type;
-   unsigned intname_idx;
unsigned intreset_done;
 #define MMC_BLK_READ   BIT(0)
 #define MMC_BLK_WRITE  BIT(1)
@@ -1458,19 +1456,6 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct 
mmc_card *card,
goto out;
}
 
-   /*
-* !subname implies we are creating main mmc_blk_data that will be
-* associated with mmc_card with mmc_set_drvdata. Due to device
-* partitions, devidx will not coincide with a per-physical card
-* index anymore so we keep track of a name index.
-*/
-   if (!subname) {
-   md-name_idx = find_first_zero_bit(name_use, max_devices);
-   __set_bit(md-name_idx, name_use);
-   } else
-   md-name_idx = ((struct mmc_blk_data *)
-   dev_to_disk(parent)-private_data)-name_idx;
-
md-area_type = area_type;
 
/*
@@ -1660,7 +1645,6 @@ static void mmc_blk_remove_parts(struct mmc_card *card,
struct list_head *pos, *q;
struct mmc_blk_data *part_md;
 
-   __clear_bit(md-name_idx, name_use);
list_for_each_safe(pos, q, md-part) {
part_md = list_entry(pos, struct mmc_blk_data, part);
list_del(pos);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] module: Use binary search in lookup_symbol()

2011-05-18 Thread Dirk Behme

On 17.05.2011 22:56, Alessio Igor Bogani wrote:

This work was supported by a hardware donation from the CE Linux Forum.

Signed-off-by: Alessio Igor Boganiabog...@kernel.org
---
  kernel/module.c |7 ++-
  1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 1e2b657..795bdc7 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2055,11 +2055,8 @@ static const struct kernel_symbol *lookup_symbol(const 
char *name,
const struct kernel_symbol *start,
const struct kernel_symbol *stop)
  {
-   const struct kernel_symbol *ks = start;
-   for (; ks  stop; ks++)
-   if (strcmp(ks-name, name) == 0)
-   return ks;
-   return NULL;
+   return bsearch(name, start, stop - start,
+   sizeof(struct kernel_symbol), cmp_name);
  }

  static int is_exported(const char *name, unsigned long value,


The old version with the warning is in linux-next now

http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=903996de9b35213aaa4162c24351a2cb2931d9ac

Best regards

Dirk
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Speed up the symbols' resolution process V4

2011-04-27 Thread Dirk Behme

On 16.04.2011 15:26, Alessio Igor Bogani wrote:

The intent of this patch is to speed up the symbols resolution process.

This objective is achieved by sorting all ksymtab* and kcrctab* symbols
(those which reside both in the kernel and in the modules) and thus use the
fast binary search.

To avoid adding lots of code for symbols sorting I rely on the linker which can
easily do the job thanks to a little trick. The trick isn't really beautiful to
see but permits minimal changes to the code and build process. Indeed the patch
is very simple and short.

In the first place I changed the code for place every symbol in a different
section (for example: ___ksymtab sec __ #sym) at compile time (this the
above mentioned trick!). Thus I request to the linker to sort and merge all
these sections into the appropriate ones (for example: __ksymtab) at link
time using the linker scripts. Once all symbols are sorted we can use binary
search instead of the linear one.

I'm fairly sure that this is a good speed improvement even though I haven't
made any comprehensive benchmarking (but follow a simple one). In any case
I would be very happy to receive suggestions about how made it. Collaterally,
the boot time should be reduced also (proportionally to the number of modules
and symbols nvolved at boot stage).

I hope that you find that interesting!

This work was supported by a hardware donation from the CE Linux Forum.

Thanks to Ian Lance Taylor for help about how the linker works.


Changes since V3:
*) Please ignore this version completely

Changes since V2:
*) Fix a bug in each_symbol() semantics by Anders Kaseorg
*) Split the work in three patches as requested by Rusty Russell
*) Add a generic binary search implementation made by Tim Abbott
*) Remove CONFIG_SYMBOLS_BSEARCH kernel option

Changes since V1:
*) Merge all patches into only one
*) Remove few useless things
*) Introduce CONFIG_SYMBOLS_BSEARCH kernel option


Alessio Igor Bogani (3):
   module: Restructure each_symbol() code
   module: Sort exported symbols
   module: Use the binary search for symbols resolution

Tim Abbott (1):
   lib: Add generic binary search function to the kernel.

  include/asm-generic/vmlinux.lds.h |   20 
  include/linux/bsearch.h   |9 
  include/linux/module.h|4 +-
  kernel/module.c   |   84 -
  lib/Makefile  |3 +-
  lib/bsearch.c |   53 +++
  scripts/module-common.lds |   11 +
  7 files changed, 151 insertions(+), 33 deletions(-)
  create mode 100644 include/linux/bsearch.h
  create mode 100644 lib/bsearch.c


Tested-by: Dirk Behme dirk.be...@googlemail.com

On an embedded ARM system insmoding a large number of modules the 
overall module load time is improved up to ~1s. Great! :)


It would be nice to get these patches into mainline asap.

Many thanks

Dirk



--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


LinuxCon Europe 2011 == ELCE 2011?

2011-01-11 Thread Dirk Behme


Is the LinuxCon Europe October 26 - 28, 2011, Prague

http://events.linuxfoundation.org/events/linuxcon-europe

now the same as the Embedded Linux Conference Europe (ELCE) 2011?

There is some rumor that these were merged. But the above page doesn't 
even mention the string 'embedded'.


Thanks

Dirk


--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New fast(?)-boot results on ARM

2009-08-18 Thread Dirk Behme

Sascha Hauer wrote:

On Fri, Aug 14, 2009 at 07:02:28PM +0200, Robert Schwebel wrote:

Hi,

On Thu, Aug 13, 2009 at 05:33:26PM +0200, Robert Schwebel wrote:

On Thu, Aug 13, 2009 at 08:28:26AM -0700, Arjan van de Ven wrote:

That's bad :-) So there is no room for improvement any more in our
ARM boot sequences ...

on x86 we're doing pretty well ;-)

On i.MX27 (400 MHz ARM926EJ-S) we currently need 7 s, measured from
power-on through the kernel up to starting init. This is with

- no delay in u-boot-v2
- rootfs on NAND (UBIFS)
- quiet
- precalculated loops-per-jiffy
- zImage kernel instead of uImage

Here's a little video of our demo system booting:
http://www.youtube.com/watch?v=xDbUnNsj0cI

As you can see there, it needs about 15 s from the release of the reset button
up to the moment where the application shows it's Qt 4.5.2 based GUI (which is
when we fade over from the initial framebuffer to the final one, in order to
hide the qt application startup noise).

And below is the boot log (after turning quiet off again). The numbers are
the timestamp and the delta to the last timestamp, measured on the controlling
PC by looking at the serial console output. The ptx_ts script starts when the
regexp was found, so the numbers start basically in the moment when u-boot-v2
has initialized the system up to the point where we can see something.

Result:

- 2.4 s up from u-boot to the end of Uncompressing Linux
- 300 ms until ubifs initialization starts
- 3.7 s for ubifs, until mounted root

So we basically have 7 s for the kernel. The rest is userspace, which hasn't
seen much optimization yet, other than trying to start the GUI application as
early as possible, while doing all other init stuff in parallel. Adding quiet
brings us another 300 ms.

That's factor 70 away from the 110 ms boot time Tim has talked about some days
ago (and he measured on an ARM cpu which had almost half the speed of this
one), and I'm wondering what we can do to improve the boot time.

Robert

r...@thebe:~$ microcom | ptx_ts U-Boot 2.0.0-rc9
[ 13.522625]   0.043189
[ 13.546627]   0.024002 OSELAS(R)-phyCORE-trunk 
(PTXdist-1.99.svn/2009-08-06T08:37:25+0200)
[ 13.558613]   0.011986
[ 13.690643]   0.132030_ ___    _
[ 13.690731]   0.88  _ __ | |__  _   _ / ___/ _ \|  _ \| |
[ 13.698595]   0.007864 | '_ \| '_ \| | | | |  | | | | |_) |  _|
[ 13.698654]   0.59 | |_) | | | | |_| | |__| |_| |  _ | |___
[ 13.702581]   0.003927 | .__/|_| |_|\__, |\\___/|_| \_\_|
[ 13.706573]   0.003992 |_|  |___/
[ 13.706622]   0.49
[ 13.725043]   0.018421
[ 14.742608]   1.017565


I made some changes suggested in this thread:

- enable MMU in the bootloader
- use assembler optimized memcpy/memset in the bootloader
- start an uncompressed image
- disable IP autoconfiguration in the Kernel
- use lpj= command line parameter
- use static device nodes instead of udev
- skip some init scripts
- made the kernel smaller (I do not have both configs handy, so I do not
  know what exactly I changed)

Already looks much better:

[  0.05]   0.05 U-Boot 2.0.0-rc10-00241-g3f10fe9-dirty (Aug 18 2009 - 
13:29:25)
[  0.26]   0.21
[  0.41]   0.15 Board: Phytec phyCORE-i.MX27
[  0.54]   0.13 cfi_probe: cfi_flash base: 0xc000 size: 0x0200
[  0.67]   0.13 NAND device: Manufacturer ID: 0x20, Chip ID: 0x36 (ST 
Micro NAND 64MiB 1,8V 8-bit)
[  0.80]   0.13 im...@imxfb0: i.MX Framebuffer driver
[  0.92]   0.12 dma_alloc: 0xa6f56e40 0x1000
[  0.000105]   0.13 dma_alloc: 0xa6f57088 0x1000
[  0.000118]   0.13 dev_protect: currently broken
[  0.000129]   0.11 Using environment in NOR Flash
[  0.000141]   0.12 initialising PLLs
[  0.128972]   0.128831 Malloc space: 0xa6f0 - 0xa7f0 (size 16 MB)
[  0.128995]   0.23 Stack space : 0xa6ef8000 - 0xa6f0 (size 32 kB)
[  0.129008]   0.13 running /env/bin/init...
[  0.224963]   0.095955
[  0.224984]   0.21 Hit any key to stop autoboot:  0
[  0.224999]   0.15 copy
[  0.592964]   0.367965 done
[  0.652010]   0.059046 Linux version 2.6.31-rc4-4-g05786f8-dirty 
(s...@octopus) (gcc version 4.3.2 (OSELAS.Toolchain-1.99.3) ) #206 PREEMPT Tue Aug 18 
14:08:51 CEST 2009


So, this are ~0.6 s in boot loader and kernel copy until kernel 
starts, correct?


What's the size of the uncompressed kernel copied here?

Best regards

Dirk

Btw.: I tried to summarize some hints given in this thread in

http://elinux.org/Boot_Time#Boot_time_check_list

Please feel free to add and correct stuff!


[  0.652030]   0.20 CPU: ARM926EJ-S [41069264] revision 4 (ARMv5TEJ), 
cr=00053177
[  0.652044]   0.14 CPU: VIVT data cache, VIVT instruction cache
[  0.652057]   0.13 Machine: phyCORE-i.MX27
[  0.652069]   0.12 Memory policy: ECC disabled, Data cache writeback
[  0.652082]   0.13 Built 1 zonelists in Zone order, mobility grouping 
on.  Total pages: 32512
[  0.706012]   0.053930