Hi all, Tanks for this work.
Bests, Régis Perdreau Le lun. 16 juin 2025 à 13:21, Devansh Varshney < varshney.devansh...@gmail.com> a écrit : > Hi everyone, > > This week has been one of the best learning experiences for me, > especially digging into the "behind-the-scenes" of LibreOffice's UNO APIs. > > My initial work (Gerrit 185362 > <https://gerrit.libreoffice.org/c/core/+/185362>) was a first step, but > feedback from my > mentors in our meetings provided a crucial directive: first, figure out > how to get the data. Before we can build a great auto-completion system, > we need a deep, proven understanding of where all the information (for > BASIC, UNO, ScriptForge, etc.) lives and how to access it programmatically. > > This led to a fascinating dive into the UNO data pipeline. > > *Understanding the UNO Data Pipeline: From IDL to Runtime* > For anyone curious about how UNO works under the hood, here's a breakdown > of > what I've learned. It's a pipeline that turns human-readable API > definitions > into an efficient system the application uses at runtime. > > *IDL* *(Interface Definition Language):* This is the source of truth > for all > UNO APIs. These .idl text files define every service, interface, > method, > property, struct, and enum. > *Locations: udkapi/* (core types) & *offapi/ *(office-specific > types). > > *idlc & regmerge:* During the build, idlc (the IDL Compiler) > compiles .idl files into intermediate binary .urd files. Then, > regmerge combines these into .rdb (Registry Database) files. > > *.rdb Files:* These are the optimized binary databases that > LibreOffice > loads at startup. Key files include types.rdb (from udkapi.rdb etc.), > services.rdb, and offapi.rdb. This is an installation artifact, > not a source file, which clarified my initial search! > > * theCoreReflection:* At runtime, this powerful UNO service provides > live, programmatic access to all the type information that was loaded > from the .rdb files. > > * regview Tool:* A command-line tool (registry/tools/regview.cxx) > designed to dump the contents of an .rdb file. My initial attempts > to use this was unsuccessful, which, along with mentor guidance, led > us to > pivot our strategy. > > *SbUnoObject & XIntrospectionAccess:* The bridge in BASIC for > interacting with live UNO objects, using dynamic introspection to > discover their capabilities. > > *A simplified flow of this pipeline looks like this:* > > *.idl Files* --(idlc)--> *.urd Files* --(regmerge)--> *.rdb > Files* > (Source of Truth) (Binary intermediate) (Loaded by LO > Runtime) > > | > > v > <LO > Runtime Type System> > > (Accessible > via theCoreReflection) > > ^ > > | (Reads .rdb) > > *regview Tool* > > | > > v > > <Textual Dump> > > > *Understanding ScriptForge (wizards/source/scriptforge/)* > > I also looked into ScriptForge, which is crucial for modern BASIC > scripting. > https://gerrit.libreoffice.org/c/core/+/164867 > - *.xlb files* are XML manifests listing the libraries. > - *.xba files *are ZIP-like packages containing the actual .bas source > modules. > - *.pyi file* is a Python stub that provides type hints to Python IDEs > for > auto-completion. As Rafael Lima mentioned, this might be manually > created, > making it a great model for the kind of rich API definition we want to > achieve for BASIC. > > *How its information becomes available:* > > *.bas files (inside .xba packages listed in .xlb)* > | > v (Loaded by BasicManager/StarBASIC) > *<SbModule objects with source code>* > | > v (Compiled by SbiParser) > *<SbMethod, SbxVariable symbols within the SbModule>* > > *--- Parallel path for Python tooling ---* > *.pyi file (wizards/source/scriptforge/python/scriptforge.pyi)* > | > v (Read by Python IDEs) > *<Type hints for Python auto-completion>* > > > *From Static File Parsing to C++ PoCs* > > Given the complexities of parsing static RDB/IDL files directly, and the > clear guidance from Meeting 3, our immediate focus has shifted. The new > priority is to write C++ Proof-of-Concept (PoC) code to programmatically > gather data and get this code onto Gerrit for review. > > I'm very excited to share that the first two PoCs are complete. > Gerrit Patch: https://gerrit.libreoffice.org/c/core/+/186475 > This patch contains the CppUnit tests for these experiments. > > *UNO Services and Memes - Why Context Comes First* > So for example I’ve seen this happen a lot on social media. There’s a meme > going around, people are laughing, sharing it, reacting to it… and then > there’s > always someone in the comments asking: > "What’s the context behind this?" > > I mean, I’ve done it too. Sometimes you just miss the reference, maybe it’s > from a movie, or some political moment, or even a viral soundbite. Without > the > context, it’s just a picture or a clip. You don’t get why it’s funny, why > it hits. > > *And then someone replies and goes:* > "Oh, this is from Interstellar, that scene where Cooper watches years of > messages after time dilation." > > Now it starts to click. *That context sets the stage*. > > *Then maybe another reply adds:* > "Yeah, and the reason it’s funny here is because someone compared it to > missing one lecture and coming back to find the whole syllabus changed." > > So first you got the context, then someone gave the reference point, say, > the > movie and then you dove into the details: the exact scene, the emotion, the > punchline. That’s what makes it all land. > > And honestly, that’s how I see working with UNO services too. > > In our PoC, we had to first get the component context otherwise we’re just > floating, not grounded in the current state of the app. Once we had that, > we > could ask for something like com.sun.star.reflection.CoreReflection, and > only > then could we start introspecting the real details, interfaces, methods, > enums, all the building blocks. > > *It’s kind of beautiful how that maps:* > *Context* → *“Where am I?”* > *Service* → *“What am I working with?”* > *Introspection* → *“What can this thing do?”* > > And just like in memes, without context, the rest doesn’t mean much. > Funny enough, this whole idea of “context” is even a thing in frameworks > like > React or Java. So maybe context is more universal than we think. > > *Summary of C++ Proof-of-Concepts (PoCs)* > Here's a breakdown of the PoCs I've implemented in the Gerrit patch: > > *PoC 1: Listing All Available UNO Service Names* > *Concept:* Queries the *XMultiComponentFactory* (Service Manager) to > get > all creatable UNO service names. > *Source:* comphelper/processfactory.hxx (getProcessServiceManager()). > * Task:* > - Get XComponentContext. > - Get XMultiComponentFactory. > - Call getAvailableServiceNames(). > - Log each service name. > *Result:* Successfully dumped service names. > > *PoC 2: Introspecting Specific UNO Definitions via theCoreReflection* > *Concept:* *theCoreReflection* provides access to the complete > in-memory > type information that LibreOffice loaded from its RDBs. > *Source*: com.sun.star.reflection.theCoreReflection, XIdlClass, etc. > (implementation in stoc/source/ > <https://git.libreoffice.org/core/+/refs/heads/master/stoc>). > *Task:* > - Get theCoreReflection instance. > - For a list of key type names (XModel, XSpreadsheet, > PropertyValue, etc.): > - Call forName(sTypeName) to get its XIdlClass blueprint. > - Dump all details: superclasses, methods (with full parameter > info), > properties, struct fields, and enum members. > *Result:* Extracted rich, detailed API definitions. This > proves we can get the data needed for Parameter Info and accurate > dot-completion. > > > https://gerrit.libreoffice.org/c/core/+/186475/4/basic/uno_available_services_cpp_dump.txt > > *Next Steps: Diving into BASIC Internals* > > With the UNO data access path validated, the next focus is on BASIC itself. > > *PoC 3 (In Progress): The MsgBox Deep Dive* > My current task is to trace *MsgBox* from its user-facing > documentation > (both LO and MSO) down to its C++ implementation > (*SbRtl_MsgBox in basic/source/runtime/methods.cxx*). This will > help > us understand how to handle built-in functions and their > often-implicit > parameter signatures. > > *Future PoC: Parser Symbol Extraction* > After MsgBox, the plan is to write a C++ PoC that interacts with > the > SbiParser to extract its internal symbol tables (SbiSymPool) for > user-defined code. > > A mentor's comment, *"We have a cppumaker, etc., and why not a > basicmaker?"*, > really resonated with me. It highlights that our ultimate goal is to create > a powerful "analyzer" for BASIC that provides the same level of rich, > structured information for our IDE tools as other "makers" do for their > respective languages. And yes I have to speed up stuff. > > Thanks for following this. > > -- > *Regards,* > *Devansh* >