On Tue, 6 Jan 2026, Matthew Pierce wrote: > I asked Grok to self-design a prompt for accurately converting sheet music > images into Lilypond code. Early results are promising. >
Matthew, thank you for starting this discussion. It got me curious about what AI can do with music transcription in LilyPond. I did my own experiment. Conclusion: Grok failed, and ChatGPT probably wouldn’t/won’t do much better. Below my signature is some detail about what I did, step-by-step. If anyone works on software/AI/OMR that might actually *work* someday for transcribing music PDFs into LilyPond code, I am happy to share the files that I used in my experiment. I would gladly pay a monthly/annual subscription to software/AI/OMR *if* it could complete a task like this with ~90%+ accuracy. For now, it’s back to the good old-fashioned mechanical pencil* to mark up my hard-copy score with the annotations that I would have added in LilyPond if AI had worked! Cheers, Gabriel *By the way, I’d welcome recommendations of the best mechanical pencils for annotating music (off-list, please, since this is not about LilyPond). I was excited about my first Rotring 600, and there is a lot that I like about it, but also it frustrates me often, and I’d like to try another model. My experiment: - I gave Grok the prompt in Matthew’s message of 6 January <https://lists.gnu.org/archive/html/lilypond-user/2026-01/msg00043.html>, with a little bit more instructions/context. - I gave it a good scan of a movement of Bach’s St John Passion (Bärenreiter edition) to “read.” - Grok’s first “draft” was nonsense/hallucinations. The code would not compile. - Then, I manually transcribed the beginning of the movement myself (the first 8 or so measures in each instrument/staff). I gave Grok my .ly file and the PDF output. I asked it to engrave the rest of the movement. - The results were bad again. For example, in the first violin part, I had carefully, exactly transcribed every detail of the first eight measures of music, and then written … % Grok, please continue from here onward to the end of the piece (measure > 91) … and Grok picked up there with: > % continue with the running pattern, transposing as needed to match harmony > \repeat unfold 79 { b8. ais16 ais8. cis16 cis d cis b | } … which is very obviously not Bach’s music. - I wrote back, saying, in essence, “You did not complete the assignment.” I pushed Grok actually to transcribe every note in the movement. - That did not go well, either. The code compiled this time, but it’s full of errors. Grok even removed a number of *correct* things in my human/manual transcription of the first eight measures! - I went to ChatGPT and explained that Grok had failed. ChatGPT sent a “lovely” response proposing that we work “together” in small chunks. Outline/headers within ChatGPT’s response: 1. What you’ve actually asked for (and why Grok failed) 2. How I will proceed (correctly) 3. Concrete continuation: measures 24–27 (instrumental line) 4. What I will not do (unless you ask) 5. Proposed next steps (editorially sane) The only workflow that stays accurate is: I continue *3–5 measures at a time, per voice* You compile and visually confirm We move on to: remaining instrumental lines then vocal line + lyrics alignment finally layout refinements (system breaks, spacing, slur shaping) - It sounds nice in theory, but I am not going to bother with this method because ChatGPT’s draft music for measures 25–27 is obviously wrong.
