Re: Jenkins on SQLite

'Herve Le Meur' via Jenkins Developers Sun, 03 Apr 2022 03:09:16 -0700

Impressed by the little amount of modifications needed for your prototype,
and really like the idea!


On Sun, Apr 3, 2022 at 3:59 AM Basil Crow <[email protected]> wrote:

> In the past we have talked about our vision and goals for Jenkins 3.0
> on this list. Here is one of mine.
>
> Has anyone besides me been highly dissatisfied with the way Jenkins
> does object persistence? I think we are leaving a lot of functionality
> and performance on the table by using flat files rather than a
> relational database. Just run syncsnoop.bt on any Jenkins controller
> and observe that a standard installation writes out dozens of tiny
> files per second while running a Pipeline job and calls fsync(2) on
> every single one of them (!). This architectural choice is
> constraining our ability to implement new features at reasonable
> performance, especially with regard to test results and static
> analysis checks.
>
> I think SQLite is the ideal choice for a relational database for
> Jenkins. SQLite directly competes with flat files, which is what we
> are using today. Furthermore, it is serverless, so it would not
> introduce any new installation or upgrade requirements. The migration
> could be handled transparently on upgrade to the new version.
>
> True, SQLite allows at most one writer to proceed concurrently. But do
> we really need to support more than one concurrent writer for most
> metadata, like the Configure System page? Obviously we need to support
> concurrent builds of jobs. This can be handled by defining a set of
> namespaces as concurrency domains, each one backed by its own SQLite
> database. For example, we can have one SQLite database for global
> configuration, one SQLite database for the build queue, one SQLite
> database for each job (or even build), etc. In this way we can in fact
> support multiple writers interacting with different parts of the
> system concurrently. The point is that by grouping these into
> high-level buckets we can take advantage of the economies of scale
> provided by the database and OS page cache.
>
> I put together a quick prototype today at
> https://github.com/basil/jenkins/tree/sqlite. My Jenkins home looks
> like this:
>
> ${JENKINS_HOME}/sqlite.db (one primary SQLite database)
> ${JENKINS_HOME}/jobs/test/sqlite.db (one SQLite database per job in
> this prototype)
>
> The primary SQLite database has these tables:
>
> $ sqlite3 sqlite.db .tables
> config
> hudson.model.UpdateCenter
> hudson.plugins.git.GitTool
> jenkins.security.QueueItemAuthenticatorConfiguration
> jenkins.security.UpdateSiteWarningsConfiguration
> jenkins.security.apitoken.ApiTokenPropertyConfiguration
> jenkins.telemetry.Correlator
> nodeMonitors
> org.jenkinsci.plugins.workflow.flow.FlowExecutionList
> queue
> users/admin_12464527240177267930/config
> users/users
>
> Each table represents an old XML file. In this prototype I am just
> serializing the object with XStream and Jettison as JSON rather than
> XML and storing it in one JSON column. Why JSON, you ask? Because
> SQLite has a fully featured JSON extension. So here is how config.xml
> looks:
>
> $ sqlite3 sqlite.db 'select json from config'
>
> {"hudson":{"disabledAdministrativeMonitors":[""],"version":"2.342-SNAPSHOT","numExecutors":2,"mode":"NORMAL","useSecurity":true,"authorizationStrategy":{"@class":"hudson.security.AuthorizationStrategy$Unsecured"},"securityRealm":{"@class":"hudson.security.HudsonPrivateSecurityRealm","disableSignup":true,"enableCaptcha":false},"disableRememberMe":false,"projectNamingStrategy":{"@class":"jenkins.model.ProjectNamingStrategy$DefaultProjectNamingStrategy"},"workspaceDir":"${JENKINS_HOME}\/workspace\/${ITEM_FULL_NAME}","buildsDir":"${ITEM_ROOTDIR}\/builds","markupFormatter":{"@class":"hudson.markup.EscapedMarkupFormatter"},"jdks":[""],"viewsTabBar":{"@class":"hudson.views.DefaultViewsTabBar"},"myViewsTabBar":{"@class":"hudson.views.DefaultMyViewsTabBar"},"clouds":[""],"scmCheckoutRetryCount":0,"primaryView":"all","slaveAgentPort":-1,"label":"","crumbIssuer":{"@class":"hudson.security.csrf.DefaultCrumbIssuer","excludeClientIPFromCrumb":false},"nodeProperties":[""],"globalNodeProperties":[""],"nodeRenameMigrationNeeded":false}}
>
> The job SQLite database has these tables:
>
> $ sqlite3 jobs/test/sqlite.db .tables
> build        config       junitResult  workflow
>
> These correspond to the old XML files as well. So builds/1/build.xml
> is row 1 in the build table with a JSON column for its content,
> builds/1/junitResult.xml is row 1 in the junitResult table with a JSON
> column for its content, builds/1/workflow/2.xml is a row in the
> workflow table with a composite key of workflow 2 and build 1 and a
> JSON column for its content, etc. I have not yet attempted to deal
> with things like SCM changelogs, permalinks, nextBuildNumber, and the
> like, but these could all be moved into the SQLite database as well.
> Halfway through this prototype I realized I was building an ORM from
> scratch, so it might be worth exploring an existing solution like
> Hibernate. But I was able to get quite far just stuffing JSON from
> XStream into a primitive table layout in SQLite.
>
> How does this all stack up? Well, Freestyle and Pipeline jobs work
> just fine, and performance seems quite fast. True, multiple concurrent
> builds of the same Pipeline job will be contending with each other to
> write new Pipeline steps out to the workflow table, yet also there are
> economies of scale to be gained in letting the database manage the
> layout of the data within a single file rather than laying out data
> ourselves in multiple files and fsync(2)'ing each one. SQLite offers
> "extra", "full", "normal", and "off" settings for its "synchronous"
> option, which we can map to the existing Pipeline durability levels.
>
> Obviously this code is a rough prototype, but I was surprised at how
> much just worked out of the box after a few hours of hacking. I think
> there could be a future for Jenkins where everything is managed by
> SQLite databases and where we leave XStream behind in favor of an ORM
> like Hibernate. On upgrade, we can read in all the data with XStream
> and write it out to SQLite with the ORM. From then on, serialization
> and deserialization would work through an ORM against the relevant
> SQLite database(s). And this would be on by default for everyone on
> upgrade, not some opt-in plugin.
>
> I think the functionality and performance we could get out of such a
> system would be better than what we have today. The real benefit would
> come after the migration when we can optimize slow operations, like
> loading builds or displaying test results and static analysis results,
> with hand-rolled SQL queries. We could also allow people to do
> full-text search of build console logs.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jenkinsci-dev/CAFwNDjrq8OAZs%3DNL3-B7rYD2jqvSWsXs5iY8UyJxPCWEUHk6WA%40mail.gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/CAL-Lwjzk4zXUSjDC6tRmBAsuTdKZ8bR9fUgo%2B3iAshawxBBBow%40mail.gmail.com.

Re: Jenkins on SQLite

Reply via email to