Hi everyone, This week has been one of the best learning experiences for me, especially digging into the "behind-the-scenes" of LibreOffice's UNO APIs.
My initial work (Gerrit 185362 <https://gerrit.libreoffice.org/c/core/+/185362>) was a first step, but feedback from my mentors in our meetings provided a crucial directive: first, figure out how to get the data. Before we can build a great auto-completion system, we need a deep, proven understanding of where all the information (for BASIC, UNO, ScriptForge, etc.) lives and how to access it programmatically. This led to a fascinating dive into the UNO data pipeline. *Understanding the UNO Data Pipeline: From IDL to Runtime* For anyone curious about how UNO works under the hood, here's a breakdown of what I've learned. It's a pipeline that turns human-readable API definitions into an efficient system the application uses at runtime. *IDL* *(Interface Definition Language):* This is the source of truth for all UNO APIs. These .idl text files define every service, interface, method, property, struct, and enum. *Locations: udkapi/* (core types) & *offapi/ *(office-specific types). *idlc & regmerge:* During the build, idlc (the IDL Compiler) compiles .idl files into intermediate binary .urd files. Then, regmerge combines these into .rdb (Registry Database) files. *.rdb Files:* These are the optimized binary databases that LibreOffice loads at startup. Key files include types.rdb (from udkapi.rdb etc.), services.rdb, and offapi.rdb. This is an installation artifact, not a source file, which clarified my initial search! * theCoreReflection:* At runtime, this powerful UNO service provides live, programmatic access to all the type information that was loaded from the .rdb files. * regview Tool:* A command-line tool (registry/tools/regview.cxx) designed to dump the contents of an .rdb file. My initial attempts to use this was unsuccessful, which, along with mentor guidance, led us to pivot our strategy. *SbUnoObject & XIntrospectionAccess:* The bridge in BASIC for interacting with live UNO objects, using dynamic introspection to discover their capabilities. *A simplified flow of this pipeline looks like this:* *.idl Files* --(idlc)--> *.urd Files* --(regmerge)--> *.rdb Files* (Source of Truth) (Binary intermediate) (Loaded by LO Runtime) | v <LO Runtime Type System> (Accessible via theCoreReflection) ^ | (Reads .rdb) *regview Tool* | v <Textual Dump> *Understanding ScriptForge (wizards/source/scriptforge/)* I also looked into ScriptForge, which is crucial for modern BASIC scripting. https://gerrit.libreoffice.org/c/core/+/164867 - *.xlb files* are XML manifests listing the libraries. - *.xba files *are ZIP-like packages containing the actual .bas source modules. - *.pyi file* is a Python stub that provides type hints to Python IDEs for auto-completion. As Rafael Lima mentioned, this might be manually created, making it a great model for the kind of rich API definition we want to achieve for BASIC. *How its information becomes available:* *.bas files (inside .xba packages listed in .xlb)* | v (Loaded by BasicManager/StarBASIC) *<SbModule objects with source code>* | v (Compiled by SbiParser) *<SbMethod, SbxVariable symbols within the SbModule>* *--- Parallel path for Python tooling ---* *.pyi file (wizards/source/scriptforge/python/scriptforge.pyi)* | v (Read by Python IDEs) *<Type hints for Python auto-completion>* *From Static File Parsing to C++ PoCs* Given the complexities of parsing static RDB/IDL files directly, and the clear guidance from Meeting 3, our immediate focus has shifted. The new priority is to write C++ Proof-of-Concept (PoC) code to programmatically gather data and get this code onto Gerrit for review. I'm very excited to share that the first two PoCs are complete. Gerrit Patch: https://gerrit.libreoffice.org/c/core/+/186475 This patch contains the CppUnit tests for these experiments. *UNO Services and Memes - Why Context Comes First* So for example I’ve seen this happen a lot on social media. There’s a meme going around, people are laughing, sharing it, reacting to it… and then there’s always someone in the comments asking: "What’s the context behind this?" I mean, I’ve done it too. Sometimes you just miss the reference, maybe it’s from a movie, or some political moment, or even a viral soundbite. Without the context, it’s just a picture or a clip. You don’t get why it’s funny, why it hits. *And then someone replies and goes:* "Oh, this is from Interstellar, that scene where Cooper watches years of messages after time dilation." Now it starts to click. *That context sets the stage*. *Then maybe another reply adds:* "Yeah, and the reason it’s funny here is because someone compared it to missing one lecture and coming back to find the whole syllabus changed." So first you got the context, then someone gave the reference point, say, the movie and then you dove into the details: the exact scene, the emotion, the punchline. That’s what makes it all land. And honestly, that’s how I see working with UNO services too. In our PoC, we had to first get the component context otherwise we’re just floating, not grounded in the current state of the app. Once we had that, we could ask for something like com.sun.star.reflection.CoreReflection, and only then could we start introspecting the real details, interfaces, methods, enums, all the building blocks. *It’s kind of beautiful how that maps:* *Context* → *“Where am I?”* *Service* → *“What am I working with?”* *Introspection* → *“What can this thing do?”* And just like in memes, without context, the rest doesn’t mean much. Funny enough, this whole idea of “context” is even a thing in frameworks like React or Java. So maybe context is more universal than we think. *Summary of C++ Proof-of-Concepts (PoCs)* Here's a breakdown of the PoCs I've implemented in the Gerrit patch: *PoC 1: Listing All Available UNO Service Names* *Concept:* Queries the *XMultiComponentFactory* (Service Manager) to get all creatable UNO service names. *Source:* comphelper/processfactory.hxx (getProcessServiceManager()). * Task:* - Get XComponentContext. - Get XMultiComponentFactory. - Call getAvailableServiceNames(). - Log each service name. *Result:* Successfully dumped service names. *PoC 2: Introspecting Specific UNO Definitions via theCoreReflection* *Concept:* *theCoreReflection* provides access to the complete in-memory type information that LibreOffice loaded from its RDBs. *Source*: com.sun.star.reflection.theCoreReflection, XIdlClass, etc. (implementation in stoc/source/ <https://git.libreoffice.org/core/+/refs/heads/master/stoc>). *Task:* - Get theCoreReflection instance. - For a list of key type names (XModel, XSpreadsheet, PropertyValue, etc.): - Call forName(sTypeName) to get its XIdlClass blueprint. - Dump all details: superclasses, methods (with full parameter info), properties, struct fields, and enum members. *Result:* Extracted rich, detailed API definitions. This proves we can get the data needed for Parameter Info and accurate dot-completion. https://gerrit.libreoffice.org/c/core/+/186475/4/basic/uno_available_services_cpp_dump.txt *Next Steps: Diving into BASIC Internals* With the UNO data access path validated, the next focus is on BASIC itself. *PoC 3 (In Progress): The MsgBox Deep Dive* My current task is to trace *MsgBox* from its user-facing documentation (both LO and MSO) down to its C++ implementation (*SbRtl_MsgBox in basic/source/runtime/methods.cxx*). This will help us understand how to handle built-in functions and their often-implicit parameter signatures. *Future PoC: Parser Symbol Extraction* After MsgBox, the plan is to write a C++ PoC that interacts with the SbiParser to extract its internal symbol tables (SbiSymPool) for user-defined code. A mentor's comment, *"We have a cppumaker, etc., and why not a basicmaker?"* , really resonated with me. It highlights that our ultimate goal is to create a powerful "analyzer" for BASIC that provides the same level of rich, structured information for our IDE tools as other "makers" do for their respective languages. And yes I have to speed up stuff. Thanks for following this. -- *Regards,* *Devansh*