In the past we have talked about our vision and goals for Jenkins 3.0 on this list. Here is one of mine.
Has anyone besides me been highly dissatisfied with the way Jenkins does object persistence? I think we are leaving a lot of functionality and performance on the table by using flat files rather than a relational database. Just run syncsnoop.bt on any Jenkins controller and observe that a standard installation writes out dozens of tiny files per second while running a Pipeline job and calls fsync(2) on every single one of them (!). This architectural choice is constraining our ability to implement new features at reasonable performance, especially with regard to test results and static analysis checks. I think SQLite is the ideal choice for a relational database for Jenkins. SQLite directly competes with flat files, which is what we are using today. Furthermore, it is serverless, so it would not introduce any new installation or upgrade requirements. The migration could be handled transparently on upgrade to the new version. True, SQLite allows at most one writer to proceed concurrently. But do we really need to support more than one concurrent writer for most metadata, like the Configure System page? Obviously we need to support concurrent builds of jobs. This can be handled by defining a set of namespaces as concurrency domains, each one backed by its own SQLite database. For example, we can have one SQLite database for global configuration, one SQLite database for the build queue, one SQLite database for each job (or even build), etc. In this way we can in fact support multiple writers interacting with different parts of the system concurrently. The point is that by grouping these into high-level buckets we can take advantage of the economies of scale provided by the database and OS page cache. I put together a quick prototype today at https://github.com/basil/jenkins/tree/sqlite. My Jenkins home looks like this: ${JENKINS_HOME}/sqlite.db (one primary SQLite database) ${JENKINS_HOME}/jobs/test/sqlite.db (one SQLite database per job in this prototype) The primary SQLite database has these tables: $ sqlite3 sqlite.db .tables config hudson.model.UpdateCenter hudson.plugins.git.GitTool jenkins.security.QueueItemAuthenticatorConfiguration jenkins.security.UpdateSiteWarningsConfiguration jenkins.security.apitoken.ApiTokenPropertyConfiguration jenkins.telemetry.Correlator nodeMonitors org.jenkinsci.plugins.workflow.flow.FlowExecutionList queue users/admin_12464527240177267930/config users/users Each table represents an old XML file. In this prototype I am just serializing the object with XStream and Jettison as JSON rather than XML and storing it in one JSON column. Why JSON, you ask? Because SQLite has a fully featured JSON extension. So here is how config.xml looks: $ sqlite3 sqlite.db 'select json from config' {"hudson":{"disabledAdministrativeMonitors":[""],"version":"2.342-SNAPSHOT","numExecutors":2,"mode":"NORMAL","useSecurity":true,"authorizationStrategy":{"@class":"hudson.security.AuthorizationStrategy$Unsecured"},"securityRealm":{"@class":"hudson.security.HudsonPrivateSecurityRealm","disableSignup":true,"enableCaptcha":false},"disableRememberMe":false,"projectNamingStrategy":{"@class":"jenkins.model.ProjectNamingStrategy$DefaultProjectNamingStrategy"},"workspaceDir":"${JENKINS_HOME}\/workspace\/${ITEM_FULL_NAME}","buildsDir":"${ITEM_ROOTDIR}\/builds","markupFormatter":{"@class":"hudson.markup.EscapedMarkupFormatter"},"jdks":[""],"viewsTabBar":{"@class":"hudson.views.DefaultViewsTabBar"},"myViewsTabBar":{"@class":"hudson.views.DefaultMyViewsTabBar"},"clouds":[""],"scmCheckoutRetryCount":0,"primaryView":"all","slaveAgentPort":-1,"label":"","crumbIssuer":{"@class":"hudson.security.csrf.DefaultCrumbIssuer","excludeClientIPFromCrumb":false},"nodeProperties":[""],"globalNodeProperties":[""],"nodeRenameMigrationNeeded":false}} The job SQLite database has these tables: $ sqlite3 jobs/test/sqlite.db .tables build config junitResult workflow These correspond to the old XML files as well. So builds/1/build.xml is row 1 in the build table with a JSON column for its content, builds/1/junitResult.xml is row 1 in the junitResult table with a JSON column for its content, builds/1/workflow/2.xml is a row in the workflow table with a composite key of workflow 2 and build 1 and a JSON column for its content, etc. I have not yet attempted to deal with things like SCM changelogs, permalinks, nextBuildNumber, and the like, but these could all be moved into the SQLite database as well. Halfway through this prototype I realized I was building an ORM from scratch, so it might be worth exploring an existing solution like Hibernate. But I was able to get quite far just stuffing JSON from XStream into a primitive table layout in SQLite. How does this all stack up? Well, Freestyle and Pipeline jobs work just fine, and performance seems quite fast. True, multiple concurrent builds of the same Pipeline job will be contending with each other to write new Pipeline steps out to the workflow table, yet also there are economies of scale to be gained in letting the database manage the layout of the data within a single file rather than laying out data ourselves in multiple files and fsync(2)'ing each one. SQLite offers "extra", "full", "normal", and "off" settings for its "synchronous" option, which we can map to the existing Pipeline durability levels. Obviously this code is a rough prototype, but I was surprised at how much just worked out of the box after a few hours of hacking. I think there could be a future for Jenkins where everything is managed by SQLite databases and where we leave XStream behind in favor of an ORM like Hibernate. On upgrade, we can read in all the data with XStream and write it out to SQLite with the ORM. From then on, serialization and deserialization would work through an ORM against the relevant SQLite database(s). And this would be on by default for everyone on upgrade, not some opt-in plugin. I think the functionality and performance we could get out of such a system would be better than what we have today. The real benefit would come after the migration when we can optimize slow operations, like loading builds or displaying test results and static analysis results, with hand-rolled SQL queries. We could also allow people to do full-text search of build console logs. -- You received this message because you are subscribed to the Google Groups "Jenkins Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAFwNDjrq8OAZs%3DNL3-B7rYD2jqvSWsXs5iY8UyJxPCWEUHk6WA%40mail.gmail.com.
