Impressed by the little amount of modifications needed for your prototype, and really like the idea!
On Sun, Apr 3, 2022 at 3:59 AM Basil Crow <[email protected]> wrote: > In the past we have talked about our vision and goals for Jenkins 3.0 > on this list. Here is one of mine. > > Has anyone besides me been highly dissatisfied with the way Jenkins > does object persistence? I think we are leaving a lot of functionality > and performance on the table by using flat files rather than a > relational database. Just run syncsnoop.bt on any Jenkins controller > and observe that a standard installation writes out dozens of tiny > files per second while running a Pipeline job and calls fsync(2) on > every single one of them (!). This architectural choice is > constraining our ability to implement new features at reasonable > performance, especially with regard to test results and static > analysis checks. > > I think SQLite is the ideal choice for a relational database for > Jenkins. SQLite directly competes with flat files, which is what we > are using today. Furthermore, it is serverless, so it would not > introduce any new installation or upgrade requirements. The migration > could be handled transparently on upgrade to the new version. > > True, SQLite allows at most one writer to proceed concurrently. But do > we really need to support more than one concurrent writer for most > metadata, like the Configure System page? Obviously we need to support > concurrent builds of jobs. This can be handled by defining a set of > namespaces as concurrency domains, each one backed by its own SQLite > database. For example, we can have one SQLite database for global > configuration, one SQLite database for the build queue, one SQLite > database for each job (or even build), etc. In this way we can in fact > support multiple writers interacting with different parts of the > system concurrently. The point is that by grouping these into > high-level buckets we can take advantage of the economies of scale > provided by the database and OS page cache. > > I put together a quick prototype today at > https://github.com/basil/jenkins/tree/sqlite. My Jenkins home looks > like this: > > ${JENKINS_HOME}/sqlite.db (one primary SQLite database) > ${JENKINS_HOME}/jobs/test/sqlite.db (one SQLite database per job in > this prototype) > > The primary SQLite database has these tables: > > $ sqlite3 sqlite.db .tables > config > hudson.model.UpdateCenter > hudson.plugins.git.GitTool > jenkins.security.QueueItemAuthenticatorConfiguration > jenkins.security.UpdateSiteWarningsConfiguration > jenkins.security.apitoken.ApiTokenPropertyConfiguration > jenkins.telemetry.Correlator > nodeMonitors > org.jenkinsci.plugins.workflow.flow.FlowExecutionList > queue > users/admin_12464527240177267930/config > users/users > > Each table represents an old XML file. In this prototype I am just > serializing the object with XStream and Jettison as JSON rather than > XML and storing it in one JSON column. Why JSON, you ask? Because > SQLite has a fully featured JSON extension. So here is how config.xml > looks: > > $ sqlite3 sqlite.db 'select json from config' > > {"hudson":{"disabledAdministrativeMonitors":[""],"version":"2.342-SNAPSHOT","numExecutors":2,"mode":"NORMAL","useSecurity":true,"authorizationStrategy":{"@class":"hudson.security.AuthorizationStrategy$Unsecured"},"securityRealm":{"@class":"hudson.security.HudsonPrivateSecurityRealm","disableSignup":true,"enableCaptcha":false},"disableRememberMe":false,"projectNamingStrategy":{"@class":"jenkins.model.ProjectNamingStrategy$DefaultProjectNamingStrategy"},"workspaceDir":"${JENKINS_HOME}\/workspace\/${ITEM_FULL_NAME}","buildsDir":"${ITEM_ROOTDIR}\/builds","markupFormatter":{"@class":"hudson.markup.EscapedMarkupFormatter"},"jdks":[""],"viewsTabBar":{"@class":"hudson.views.DefaultViewsTabBar"},"myViewsTabBar":{"@class":"hudson.views.DefaultMyViewsTabBar"},"clouds":[""],"scmCheckoutRetryCount":0,"primaryView":"all","slaveAgentPort":-1,"label":"","crumbIssuer":{"@class":"hudson.security.csrf.DefaultCrumbIssuer","excludeClientIPFromCrumb":false},"nodeProperties":[""],"globalNodeProperties":[""],"nodeRenameMigrationNeeded":false}} > > The job SQLite database has these tables: > > $ sqlite3 jobs/test/sqlite.db .tables > build config junitResult workflow > > These correspond to the old XML files as well. So builds/1/build.xml > is row 1 in the build table with a JSON column for its content, > builds/1/junitResult.xml is row 1 in the junitResult table with a JSON > column for its content, builds/1/workflow/2.xml is a row in the > workflow table with a composite key of workflow 2 and build 1 and a > JSON column for its content, etc. I have not yet attempted to deal > with things like SCM changelogs, permalinks, nextBuildNumber, and the > like, but these could all be moved into the SQLite database as well. > Halfway through this prototype I realized I was building an ORM from > scratch, so it might be worth exploring an existing solution like > Hibernate. But I was able to get quite far just stuffing JSON from > XStream into a primitive table layout in SQLite. > > How does this all stack up? Well, Freestyle and Pipeline jobs work > just fine, and performance seems quite fast. True, multiple concurrent > builds of the same Pipeline job will be contending with each other to > write new Pipeline steps out to the workflow table, yet also there are > economies of scale to be gained in letting the database manage the > layout of the data within a single file rather than laying out data > ourselves in multiple files and fsync(2)'ing each one. SQLite offers > "extra", "full", "normal", and "off" settings for its "synchronous" > option, which we can map to the existing Pipeline durability levels. > > Obviously this code is a rough prototype, but I was surprised at how > much just worked out of the box after a few hours of hacking. I think > there could be a future for Jenkins where everything is managed by > SQLite databases and where we leave XStream behind in favor of an ORM > like Hibernate. On upgrade, we can read in all the data with XStream > and write it out to SQLite with the ORM. From then on, serialization > and deserialization would work through an ORM against the relevant > SQLite database(s). And this would be on by default for everyone on > upgrade, not some opt-in plugin. > > I think the functionality and performance we could get out of such a > system would be better than what we have today. The real benefit would > come after the migration when we can optimize slow operations, like > loading builds or displaying test results and static analysis results, > with hand-rolled SQL queries. We could also allow people to do > full-text search of build console logs. > > -- > You received this message because you are subscribed to the Google Groups > "Jenkins Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/jenkinsci-dev/CAFwNDjrq8OAZs%3DNL3-B7rYD2jqvSWsXs5iY8UyJxPCWEUHk6WA%40mail.gmail.com > . > -- You received this message because you are subscribed to the Google Groups "Jenkins Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAL-Lwjzk4zXUSjDC6tRmBAsuTdKZ8bR9fUgo%2B3iAshawxBBBow%40mail.gmail.com.
