On later testing, it failed badly on hard stuff, such as Mahler orchestral parts.
Upon pressing, it blamed unclear parts (which was fair). I have yet to try a photo of a clean, simple, professionally produced part. That will be the next test. Sent from Samsung Galaxy smartphone. -------- Original message -------- From: Richard Shann <[email protected]> Date: 1/7/26 6:19 AM (GMT-05:00) To: Matthew Pierce <[email protected]>, lilypond-user mailinglist <[email protected]> Subject: Re: LLM prompt: turn sheet music into code On Tue, 2026-01-06 at 17:52 +0000, Matthew Pierce wrote: > I asked Grok to self-design a prompt for accurately converting sheet > music images into Lilypond code. Early results are promising. > > My test: I showed Grok a screenshotted page from a cello arrangement > I recently constructed in Lilypond, gave it the prompt below, > compiled the code it suggested, and compared the results > > Results: The visual match was astonishingly good, even though Grok's > generated code is different from mine in various ways. > > The only major difference was the final system being kicked to the > next page by Grok's code. I suspect this may have been caused by a > slight cropping at the bottom of the page in my screenshotted image. > > This prompt might be a great shortcut for importing existing sheet > music into Lilypond code. > > Try it on the LLM of your choice. Happy testing! > > Prompt follows: > > "You are an expert in LilyPond notation and optical music > recognition. Given the attached sheet music image, meticulously > reverse-engineer the LilyPond code that would reproduce it with the > highest possible visual fidelity. Prioritize absolute precision and > accuracy in every aspect of the engraving—including exact note > placements, stem directions, beam groupings, slur shapes and > positions, dynamic markings, articulations, text annotations, staff > layout, spacing, and any special elements like scordatura diagrams or > irregular meters—over any considerations of speed or efficiency. > Proceed slowly and methodically: first, perform a exhaustive layer- > by-layer visual analysis of the image, documenting every observable > detail (e.g., clef type, key signature sharps/flats, time signature > symbol, pitch positions relative to the staff, durations, ties, > hairpins, bowings, and markup coordinates). Cross-reference with > music theory and LilyPond syntax to resolve ambiguities. Only after > this thorough dissection should you generate the complete LilyPond > code, including header, global settings, voices, and layout overrides > as needed to match the image pixel-for-pixel where possible. If > uncertainties arise, note them and propose the most accurate > interpretation based on standard engraving practices." I gave grok a screenshot of a single movement/single page of a sonata of my own composition. The written response sounded extremely convincing - it had detected that the piece was written in Baroque style for a start - but in detail it was wildly out, it failed to detect the time signature, declared that the piece was in a different key from the correct one, failed to detect the systems beyond the first one and generated LilyPond syntax that doesn't compile, and which, while it showed insight into quite esoteric aspects of LilyPond, didn't have any obvious relation to the notes in the piece of music it was attempting to interpret. That it sounded extremely convincing could be the worst aspect of AI: "sounding convincing" is quite different from "being true", but it would be easy for a person to innocently re-post the results of an enquiry which the AI learning algorithms would find on the internet and incorporate into future responses. Richard Shann
