I don't think this will gain a lot…
after all Ben and I have better times when executing on our own systems - so 
its not that this would be a "always slow job"
its just slow on the Jenkins Infrastructure. 
It does not have to be sqllite, but there is also no reason to use anything 
more complicated to do this.
Also Timestamper will not eb of much help, because we already know that both 
parts are slow: parsing the raw data and creating the new Json Files from the 
data in the DB.
/Domi


On 14.10.2013, at 15:27, Vojtech Juranek <[email protected]> wrote:

> Hi,
> I would propose to use e.g. Timestamper plugin [1] to easily see what is the 
> most time consuming operation. 
> 
> It seems to me that current job [2] queries plugin names for several hours 
> (stuck after "fetching plugin names..." so it runs "SELECT name FROM plugin 
> WHERE name NOT LIKE 'privateplugin%' GROUP BY name ;" I guess). Which IMHO 
> means index on plugin.name (from pull #1) wasn't created for some reason or 
> sqlite have some serious performance issues on db of that size.
> 
> Btw: Is sqlite a must (to be able to run in anywhere out of the box or 
> similar 
> reason) or can we deliberate the change of db backend? (e.g. I run my own 
> stats on top of mongodb, it's hard to compare as I don't do any special 
> parsing during import, store data from given months in separate collections 
> and most of the queries query on one collections, but all operations are in 
> order of seconds (queries) or dozen of seconds (data import))
> 
> [1] https://wiki.jenkins-ci.org/display/JENKINS/Timestamper
> [2] 
> https://ci.jenkins-ci.org/view/Infrastructure/job/infra_statistics/96/console
> 
> On Sunday 13 October 2013 19:32:43 Benjamin Lau wrote:
>> I ported over that memory tester I found for Java to groovy. Here are
>> the results of that:
>> ##### Heap utilization statistics [MB] #####
>> Used Memory:10.348419189453125
>> Free Memory:70.06784820556640625
>> Total Memory:81.0625
>> Max Memory:123.9375
>> 
>> Those mostly look the same as the results from the Java one except
>> that groovy seems to take up a bit more memory as a baseline. But
>> that's not an unsurprising result.
>> 
>> I noticed that the job on ci.jenkins-ci.org finally got to actually
>> outputting the json data... that only took most of the day. :-/
>> 
>> Is there a copy of JDK 6 available on ci.jenkins-ci.org that could be
>> used in place of JDK 7? Or could we get the JDK 7u40 installed to try
>> that out?
>> 
>> Ben
>> 
>> On Sun, Oct 13, 2013 at 1:51 PM, Benjamin Lau <[email protected]> 
> wrote:
>>> Yeah that might be an issue... think that would also affect sqlite?
>>> Since once the JSON parsing is completed everything is moved to
>>> working from that and it's also slow. Also... that post mentions that
>>> it's not yet fixed in u17 but I noticed that u40 is the latest...
>>> 
>>> What version of Java7 is running on the ci.jenkins-ci.org Jenkins? And
>>> also interesting... way back at the top of the current run there's
>>> some sort of error related to auto-installing the JDK:
>>> 
>>> Unable to auto-install JDK until the license is accepted.
>>> 
>>> I wonder what that's about?
>>> 
>>> Ben
>>> 
>>> On Sun, Oct 13, 2013 at 1:34 PM, Domi <[email protected]> wrote:
>>>> This might be a shoot in the dark, but I also only used java6 and have a
>>>> much better performance... Maybe we hit this:
>>>> http://java-performance.info/changes-to-string-java-1-7-0_06/
>>>> ... we really are parsing a lot if Json, so this could be an issue with
>>>> Java7...
>>>> 
>>>> Am 13.10.2013 um 22:04 schrieb Benjamin Lau <[email protected]>:
>>>> 
>>>> Groovy Version: 2.1.6 JVM: 1.6.0_51 Vendor: Apple Inc. OS: Mac OS X
>>>> 
>>>> Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509)
>>>> 
>>>> ##### Heap utilization statistics [MB] ##### [1]
>>>> Used Memory:1
>>>> Free Memory:79
>>>> Total Memory:81
>>>> Max Memory:123
>>>> 
>>>> [1]
>>>> http://viralpatel.net/blogs/getting-jvm-heap-size-used-memory-total-memor
>>>> y-using-java-runtime/
>>>> 
>>>> I'm not a groovy expert... is there a way to query it about what it
>>>> thinks its heap size is?
>>>> 
>>>> Ben
>>>> 
>>>> On Sun, Oct 13, 2013 at 8:35 AM, domi <[email protected]> wrote:
>>>> 
>>>> merged and started an other build:
>>>> https://ci.jenkins-ci.org/view/Infrastructure/job/infra_statistics/95/
>>>> 
>>>> some other ideas/questions:
>>>> 
>>>> - groovy version?
>>>> 
>>>> - java version?
>>>> 
>>>> - memory (heap)?
>>>> 
>>>> regards Domi
>>>> 
>>>> 
>>>> 
>>>> On 11.10.2013, at 00:13, Benjamin Lau <[email protected]> wrote:
>>>> 
>>>> 
>>>> Some performance stats:
>>>> 
>>>> With no indexes (didn't even finish running killed when it got to
>>>> envfile):
>>>> 
>>>> real    632m21.430s user    48m28.973s sys    84m20.053s
>>>> 
>>>> 
>>>> + CREATE INDEX plugin_name on plugin (name);
>>>> 
>>>> real    102m17.946s user    7m19.054s sys    15m54.729s
>>>> 
>>>> 
>>>> + CREATE INDEX jenkins_version on jenkins (version);
>>>> 
>>>> real    24m53.078s user    3m38.248s sys    3m38.441s
>>>> 
>>>> 
>>>> + CREATE INDEX plugin_month on plugin (month);
>>>> 
>>>> real    76m37.949s user    4m54.953s sys    10m2.820s
>>>> 
>>>> Note: not exactly sure how adding an addition index winds up with the
>>>> 
>>>> script running slower...
>>>> 
>>>> 
>>>> + CREATE INDEX plugin_namemonth on plugin (name,month);
>>>> 
>>>> real    1m13.944s user    0m44.397s sys    0m7.887s
>>>> 
>>>> 
>>>> + DROP INDEX plugin_month;
>>>> 
>>>> real    3m37.779s user    0m49.016s sys    0m28.856s
>>>> 
>>>> 
>>>> So based on these results I've sent another pull request[1] which adds
>>>> 
>>>> 3 additional indexes on top of the plugin.name one from before:
>>>> 
>>>> CREATE INDEX jenkins_version on jenkins (version);
>>>> 
>>>> CREATE INDEX plugin_month on plugin (month);
>>>> 
>>>> CREATE INDEX plugin_namemonth on plugin (name,month);
>>>> 
>>>> 
>>>> I ran collectNumbers.groovy after making the changes. Here's the
>>>> 
>>>> performance results from that:
>>>> 
>>>> mkdir -p target && time groovy collectNumbers.groovy
>>>> 
>>>> ../jenkins-ci.org/census/20*.gz
>>>> 
>>>> real    66m34.673s user    48m30.316s sys    19m25.083s
>>>> 
>>>> 
>>>> time groovy createJson.groovy
>>>> 
>>>> real    3m33.182s user    0m48.383s sys    0m24.900s
>>>> 
>>>> 
>>>> Ben
>>>> 
>>>> [1] https://github.com/jenkinsci/infra-statistics/pull/2
>>>> 
>>>> 
>>>> On Wed, Oct 9, 2013 at 4:05 PM, Benjamin Lau <[email protected]>
>>>> wrote:
>>>> 
>>>> I'll see if I can run a benchmark of the old version of the code vs
>>>> 
>>>> the indexed version... I was getting the same runs forever behavior on
>>>> 
>>>> my computer before I added the indexing and could run the whole job in
>>>> 
>>>> a "reasonable"[1] amount of time after the change... I'm completely
>>>> 
>>>> stumped as to why this remained broken.
>>>> 
>>>> 
>>>> Ben
>>>> 
>>>> 
>>>> On Wed, Oct 9, 2013 at 3:55 PM, Kohsuke Kawaguchi
>>>> 
>>>> <[email protected]> wrote:
>>>> 
>>>> On 10/08/2013 09:42 PM, Benjamin Lau wrote:
>>>> 
>>>> 
>>>> It seems to at least be running... but the performance characteristics
>>>> 
>>>> are nothing like what I'm seeing on the machine I have at home.
>>>> 
>>>> 
>>>> Do you know anything about what kind of hardware ci.jenkins-ci.org is?
>>>> 
>>>> 
>>>> 
>>>> I doubt if the slow down of this degree can be explained by hardware, but
>>>> 
>>>> here it is:
>>>> 
>>>> 
>>>> kohsuke@cucumber:~$ cat /proc/cpuinfo
>>>> 
>>>> processor       : 0
>>>> 
>>>> vendor_id       : AuthenticAMD
>>>> 
>>>> cpu family      : 16
>>>> 
>>>> model           : 2
>>>> 
>>>> model name      : AMD Athlon(tm) 7850 Dual-Core Processor
>>>> 
>>>> stepping        : 3
>>>> 
>>>> cpu MHz         : 1400.000
>>>> 
>>>> cache size      : 512 KB
>>>> 
>>>> physical id     : 0
>>>> 
>>>> siblings        : 2
>>>> 
>>>> core id         : 0
>>>> 
>>>> cpu cores       : 2
>>>> 
>>>> apicid          : 0
>>>> 
>>>> initial apicid  : 0
>>>> 
>>>> fpu             : yes
>>>> 
>>>> fpu_exception   : yes
>>>> 
>>>> cpuid level     : 5
>>>> 
>>>> wp              : yes
>>>> 
>>>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>>>> mca
>>>> 
>>>> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
>>>> 
>>>> pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc
>>>> 
>>>> extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic
>>>> 
>>>> cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
>>>> 
>>>> bogomips        : 5599.89
>>>> 
>>>> TLB size        : 1024 4K pages
>>>> 
>>>> clflush size    : 64
>>>> 
>>>> cache_alignment : 64
>>>> 
>>>> address sizes   : 48 bits physical, 48 bits virtual
>>>> 
>>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>> 
>>>> 
>>>> processor       : 1
>>>> 
>>>> vendor_id       : AuthenticAMD
>>>> 
>>>> cpu family      : 16
>>>> 
>>>> model           : 2
>>>> 
>>>> model name      : AMD Athlon(tm) 7850 Dual-Core Processor
>>>> 
>>>> stepping        : 3
>>>> 
>>>> cpu MHz         : 1400.000
>>>> 
>>>> cache size      : 512 KB
>>>> 
>>>> physical id     : 0
>>>> 
>>>> siblings        : 2
>>>> 
>>>> core id         : 1
>>>> 
>>>> cpu cores       : 2
>>>> 
>>>> apicid          : 1
>>>> 
>>>> initial apicid  : 1
>>>> 
>>>> fpu             : yes
>>>> 
>>>> fpu_exception   : yes
>>>> 
>>>> cpuid level     : 5
>>>> 
>>>> wp              : yes
>>>> 
>>>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>>>> mca
>>>> 
>>>> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
>>>> 
>>>> pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc
>>>> 
>>>> extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic
>>>> 
>>>> cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
>>>> 
>>>> bogomips        : 5600.08
>>>> 
>>>> TLB size        : 1024 4K pages
>>>> 
>>>> clflush size    : 64
>>>> 
>>>> cache_alignment : 64
>>>> 
>>>> address sizes   : 48 bits physical, 48 bits virtual
>>>> 
>>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>> 
>>>> 
>>>> kohsuke@cucumber:~$ cat /proc/meminfo
>>>> 
>>>> MemTotal:        8065552 kB
>>>> 
>>>> MemFree:          982308 kB
>>>> 
>>>> Buffers:          668732 kB
>>>> 
>>>> Cached:          3066860 kB
>>>> 
>>>> SwapCached:        39548 kB
>>>> 
>>>> Active:          4339220 kB
>>>> 
>>>> Inactive:        1716388 kB
>>>> 
>>>> Active(anon):    1818532 kB
>>>> 
>>>> Inactive(anon):   571804 kB
>>>> 
>>>> Active(file):    2520688 kB
>>>> 
>>>> Inactive(file):  1144584 kB
>>>> 
>>>> Unevictable:           0 kB
>>>> 
>>>> Mlocked:               0 kB
>>>> 
>>>> SwapTotal:      23631572 kB
>>>> 
>>>> SwapFree:       23049072 kB
>>>> 
>>>> Dirty:              1412 kB
>>>> 
>>>> Writeback:             0 kB
>>>> 
>>>> AnonPages:       2299860 kB
>>>> 
>>>> Mapped:            62588 kB
>>>> 
>>>> Shmem:             70320 kB
>>>> 
>>>> Slab:             754624 kB
>>>> 
>>>> SReclaimable:     700616 kB
>>>> 
>>>> SUnreclaim:        54008 kB
>>>> 
>>>> KernelStack:        3848 kB
>>>> 
>>>> PageTables:        48692 kB
>>>> 
>>>> NFS_Unstable:          0 kB
>>>> 
>>>> Bounce:                0 kB
>>>> 
>>>> WritebackTmp:          0 kB
>>>> 
>>>> CommitLimit:    27664348 kB
>>>> 
>>>> Committed_AS:    3852228 kB
>>>> 
>>>> VmallocTotal:   34359738367 kB
>>>> 
>>>> VmallocUsed:      374416 kB
>>>> 
>>>> VmallocChunk:   34359341476 kB
>>>> 
>>>> HardwareCorrupted:     0 kB
>>>> 
>>>> HugePages_Total:       0
>>>> 
>>>> HugePages_Free:        0
>>>> 
>>>> HugePages_Rsvd:        0
>>>> 
>>>> HugePages_Surp:        0
>>>> 
>>>> Hugepagesize:       2048 kB
>>>> 
>>>> DirectMap4k:        9792 kB
>>>> 
>>>> DirectMap2M:     1955840 kB
>>>> 
>>>> DirectMap1G:     6291456 kB
>>>> 
>>>> 
>>>> 
>>>> 
>>>> kohsuke@cucumber:~$ df -h
>>>> 
>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>> 
>>>> /dev/sda1             895G  438G  412G  52% /
>>>> 
>>>> none                  3.9G  208K  3.9G   1% /dev
>>>> 
>>>> none                  3.9G     0  3.9G   0% /dev/shm
>>>> 
>>>> none                  3.9G   67M  3.8G   2% /var/run
>>>> 
>>>> none                  3.9G     0  3.9G   0% /var/lock
>>>> 
>>>> none                  3.9G     0  3.9G   0% /lib/init/rw
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Kohsuke Kawaguchi | CloudBees, Inc. | http://cloudbees.com/
>>>> 
>>>> Try Jenkins Enterprise, our professional version of Jenkins
>>>> 
>>>> 
>>>> --
>>>> 
>>>> You received this message because you are subscribed to the Google Groups
>>>> 
>>>> "Jenkins Developers" group.
>>>> 
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> 
>>>> email to [email protected].
>>>> 
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>> 
>>>> 
>>>> --
>>>> 
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Jenkins Developers" group.
>>>> 
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to [email protected].
>>>> 
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>> 
>>>> 
>>>> --
>>>> 
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Jenkins Developers" group.
>>>> 
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to [email protected].
>>>> 
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>> 
>>>> 
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Jenkins Developers" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>> 
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Jenkins Developers" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to