I like the idea.

Questions:
Is SQlite the best choice as it is platform dependent? Anyone that runs Jenkins not on one of the supported platforms would not benefit.
So is H2 maybe the better choice (also from concurrency aspect)?

plugin compatibility, can we be sure all plugins still work (e.g. ssh-slaves-plugin would need changes here https://github.com/jenkinsci/ssh-slaves-plugin/blob/80a9538ba9b0bb6caa049cab9eb4d1ee26d51434/src/main/java/hudson/plugins/sshslaves/verifiers/HostKeyHelper.java#L66-L79)

At work we configure our Jenkins via xml files that we generate before Jenkins start (from times when CASC plugin didn't exist yet). There are potentially others doing the same so this would have a bigger impact.

You mentioned full text search of build logs. Do you intend to also store the build logs in the DB? I guess many plugins rely on the logs being in the File system. Also at work we upload the build logs to Splunk, that wouldn't work when put in a database.


Am 03.04.2022 um 12:08 schrieb 'Herve Le Meur' via Jenkins Developers:
Impressed by the little amount of modifications needed for your prototype, and really like the idea!

On Sun, Apr 3, 2022 at 3:59 AM Basil Crow <[email protected]> wrote:

    In the past we have talked about our vision and goals for Jenkins 3.0
    on this list. Here is one of mine.

    Has anyone besides me been highly dissatisfied with the way Jenkins
    does object persistence? I think we are leaving a lot of functionality
    and performance on the table by using flat files rather than a
    relational database. Just run syncsnoop.bt <http://syncsnoop.bt>
    on any Jenkins controller
    and observe that a standard installation writes out dozens of tiny
    files per second while running a Pipeline job and calls fsync(2) on
    every single one of them (!). This architectural choice is
    constraining our ability to implement new features at reasonable
    performance, especially with regard to test results and static
    analysis checks.

    I think SQLite is the ideal choice for a relational database for
    Jenkins. SQLite directly competes with flat files, which is what we
    are using today. Furthermore, it is serverless, so it would not
    introduce any new installation or upgrade requirements. The migration
    could be handled transparently on upgrade to the new version.

    True, SQLite allows at most one writer to proceed concurrently. But do
    we really need to support more than one concurrent writer for most
    metadata, like the Configure System page? Obviously we need to support
    concurrent builds of jobs. This can be handled by defining a set of
    namespaces as concurrency domains, each one backed by its own SQLite
    database. For example, we can have one SQLite database for global
    configuration, one SQLite database for the build queue, one SQLite
    database for each job (or even build), etc. In this way we can in fact
    support multiple writers interacting with different parts of the
    system concurrently. The point is that by grouping these into
    high-level buckets we can take advantage of the economies of scale
    provided by the database and OS page cache.

    I put together a quick prototype today at
    https://github.com/basil/jenkins/tree/sqlite. My Jenkins home looks
    like this:

    ${JENKINS_HOME}/sqlite.db (one primary SQLite database)
    ${JENKINS_HOME}/jobs/test/sqlite.db (one SQLite database per job in
    this prototype)

    The primary SQLite database has these tables:

    $ sqlite3 sqlite.db .tables
    config
    hudson.model.UpdateCenter
    hudson.plugins.git.GitTool
    jenkins.security.QueueItemAuthenticatorConfiguration
    jenkins.security.UpdateSiteWarningsConfiguration
    jenkins.security.apitoken.ApiTokenPropertyConfiguration
    jenkins.telemetry.Correlator
    nodeMonitors
    org.jenkinsci.plugins.workflow.flow.FlowExecutionList
    queue
    users/admin_12464527240177267930/config
    users/users

    Each table represents an old XML file. In this prototype I am just
    serializing the object with XStream and Jettison as JSON rather than
    XML and storing it in one JSON column. Why JSON, you ask? Because
    SQLite has a fully featured JSON extension. So here is how config.xml
    looks:

    $ sqlite3 sqlite.db 'select json from config'
    
{"hudson":{"disabledAdministrativeMonitors":[""],"version":"2.342-SNAPSHOT","numExecutors":2,"mode":"NORMAL","useSecurity":true,"authorizationStrategy":{"@class":"hudson.security.AuthorizationStrategy$Unsecured"},"securityRealm":{"@class":"hudson.security.HudsonPrivateSecurityRealm","disableSignup":true,"enableCaptcha":false},"disableRememberMe":false,"projectNamingStrategy":{"@class":"jenkins.model.ProjectNamingStrategy$DefaultProjectNamingStrategy"},"workspaceDir":"${JENKINS_HOME}\/workspace\/${ITEM_FULL_NAME}","buildsDir":"${ITEM_ROOTDIR}\/builds","markupFormatter":{"@class":"hudson.markup.EscapedMarkupFormatter"},"jdks":[""],"viewsTabBar":{"@class":"hudson.views.DefaultViewsTabBar"},"myViewsTabBar":{"@class":"hudson.views.DefaultMyViewsTabBar"},"clouds":[""],"scmCheckoutRetryCount":0,"primaryView":"all","slaveAgentPort":-1,"label":"","crumbIssuer":{"@class":"hudson.security.csrf.DefaultCrumbIssuer","excludeClientIPFromCrumb":false},"nodeProperties":[""],"globalNodeProperties":[""],"nodeRenameMigrationNeeded":false}}

    The job SQLite database has these tables:

    $ sqlite3 jobs/test/sqlite.db .tables
    build        config       junitResult  workflow

    These correspond to the old XML files as well. So builds/1/build.xml
    is row 1 in the build table with a JSON column for its content,
    builds/1/junitResult.xml is row 1 in the junitResult table with a JSON
    column for its content, builds/1/workflow/2.xml is a row in the
    workflow table with a composite key of workflow 2 and build 1 and a
    JSON column for its content, etc. I have not yet attempted to deal
    with things like SCM changelogs, permalinks, nextBuildNumber, and the
    like, but these could all be moved into the SQLite database as well.
    Halfway through this prototype I realized I was building an ORM from
    scratch, so it might be worth exploring an existing solution like
    Hibernate. But I was able to get quite far just stuffing JSON from
    XStream into a primitive table layout in SQLite.

    How does this all stack up? Well, Freestyle and Pipeline jobs work
    just fine, and performance seems quite fast. True, multiple concurrent
    builds of the same Pipeline job will be contending with each other to
    write new Pipeline steps out to the workflow table, yet also there are
    economies of scale to be gained in letting the database manage the
    layout of the data within a single file rather than laying out data
    ourselves in multiple files and fsync(2)'ing each one. SQLite offers
    "extra", "full", "normal", and "off" settings for its "synchronous"
    option, which we can map to the existing Pipeline durability levels.

    Obviously this code is a rough prototype, but I was surprised at how
    much just worked out of the box after a few hours of hacking. I think
    there could be a future for Jenkins where everything is managed by
    SQLite databases and where we leave XStream behind in favor of an ORM
    like Hibernate. On upgrade, we can read in all the data with XStream
    and write it out to SQLite with the ORM. From then on, serialization
    and deserialization would work through an ORM against the relevant
    SQLite database(s). And this would be on by default for everyone on
    upgrade, not some opt-in plugin.

    I think the functionality and performance we could get out of such a
    system would be better than what we have today. The real benefit would
    come after the migration when we can optimize slow operations, like
    loading builds or displaying test results and static analysis results,
    with hand-rolled SQL queries. We could also allow people to do
    full-text search of build console logs.

-- You received this message because you are subscribed to the Google
    Groups "Jenkins Developers" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:jenkinsci-dev%[email protected]>.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/jenkinsci-dev/CAFwNDjrq8OAZs%3DNL3-B7rYD2jqvSWsXs5iY8UyJxPCWEUHk6WA%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAL-Lwjzk4zXUSjDC6tRmBAsuTdKZ8bR9fUgo%2B3iAshawxBBBow%40mail.gmail.com <https://groups.google.com/d/msgid/jenkinsci-dev/CAL-Lwjzk4zXUSjDC6tRmBAsuTdKZ8bR9fUgo%2B3iAshawxBBBow%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups "Jenkins 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/4d37a7d9-2ea7-2a5e-ff49-3e1f83878705%40gmx.de.

Reply via email to