Hi Everyone, Thank you Stephan. After your mail, I looked into this more. Here's what I found:https://cgit.freedesktop.org/libreoffice/core/log/unoidl?h=libreoffice-4.2.8.2&id=72b8e929af5bcfb7d17a74de636fb1ef5204297b&showmsg=1 (This is in reverse chronological order.)
commit - 320571bf701a092d0f2d15fd4589ae271802a03f The cgit logs from 2013, primarily by Stephan Bergmann, document a massive refactoring of LibreOffice's core UNO infrastructure. *The "Old World" (Legacy, pre-2013):* LibreOffice inherited a system from OpenOffice.org that used a toolchain of idlc, regmerge, and regview. - .idl files compiled by idlc into binary .urd (UNO Reflection Data). - regmerge then combined these .urd files into a large, complex, legacy-format binary registry file (.rdb). - regview was the tool designed to read this old format. *The "New World" (The 2013 unoidl Refactoring):* - Goal: Replace the old, cumbersome system with something more modern, efficient, and easier to maintain. - Solution: The unoidl module was created to be the central authority for handling UNO type information. *The New Tools:* - *unoidl-write:* Replaced idlc and regmerge. It compiles .idl files directly into the new, more efficient binary format. - *unoidl-read:* Replaced regview. Its specific purpose is to read the new .rdb files and dump their contents in a human-readable, IDL-like format. - *unoidl-check*: Replaced regcompare for API compatibility. *What this history tells us:* - *Why regview Failed:* My initial PoC attempts to use regview on a modern LibreOffice build failed because I was using a legacy tool on new files. We were trying to read a Blu -ray with a VHS player. - *The Correct Tool:* This confirms that unoidl-read is the correct, modern tool for our goal of getting a static dump of the UNO API types. - RDBs are the Compiled Truth: It shows that the .rdb files are the canonical, compiled source of truth for UNO types. *Key Commits that Tell the Story:* - "WIP: Experimental new binary type.rdb format" (Stephan Bergmann, Feb/Mar/Apr 2013): Documents new RDB format and unoidl module creation. Explicitly aimed to "ultimately remove modules store and registry." - "New unoidl-read tool to translate registries into readable .idl files" (Stephan Bergmann, Sep 2013): Introduces our primary static dumping tool. - "New unoidl-check tool to replace regcompare" (Stephan Bergmann, Sep 2013): This further shows the entire legacy toolchain (regcompare, regmerge, idlc, regview) was being systematically replaced by a new, unified unoidl-* toolset. - "Revert 'WIP: Experimental new binary type.rdb format'" (multiple times): The log shows this was a complex transition with reverts and re-applications. This is normal for a change of this magnitude and explains why the legacy code and new code had to co-exist. *Why XML for "Services" RDBs?* It turns out services.rdb files are intentionally kept in XML, not by accident. Here's why: 1. Legacy support & human readability: While .rdb files for types switched to new binary, service registries remained XML "for backwards compatibility" [1 <https://listarchives.libreoffice.org/global/dev/2013/msg14613.html>, 2 <https://wiki.documentfoundation.org/Documentation/DevGuide/Extensions>, 3 <https://docs.libreoffice.org/store.html>]. Human-readable XML eases maintenance, debugging, and scripting. 2. Consistent tooling across UNO bridges: Developers noted that `program/services` .rdb files are XML-based. The Python-UNO bridge, for instance, depends on those XML service definitions; binary would hinder Python tools [4 <https://www.openoffice.org/udk/python/python-bridge.html>, 5 <https://ask.libreoffice.org/t/no-helloworldpython-nor-any-other-python-script-using-appimage/107376/21>]. 3. Consistency with ODF: LibreOffice's file formats (ODT, ODS) are XML-based (ODF). Keeping service registration in XML aligns with this broader architectural philosophy. *TL;DR:* *File Type* *Format* *Reason* types.rdb Binary Efficient, compact, new unoidl-write toolchain services.rdb XML Human-readable, backwards compatibility, supports scripting tools like Python-UNO. So, the "mixed-format" nature of the registry is a deliberate and pragmatic design choice, balancing performance (for binary types)with interoperability and maintainability (for XML services). *Flowchart: Evolution of UNO RDBs* *Old World (Pre-2013) UNO Type Processing:* +----------+ +--------+ +---------+ | .idl | --> | idlc | --> | .urd | | (Source) | | | | (Binary)| +----------+ +--------+ +---------+ | v +----------+ +-----------+ | regmerge | --> | Legacy | | | | .rdb | | | | (Binary) | +----------+ +-----------+ | v +---------+ | regview | | (Dump) | +---------+ *New World (2013 Refactoring) UNO Type Processing:* +----------+ +--------------+ +----------+ | .idl | --> | unoidl-write | --> | New | | (Source) | | | | .rdb | +----------+ +--------------+ | (Binary) | + ----------+ | v +------------+ | unoidl-read| | (Dump) | +------------+ *Special Case: UNO Service Processing (Remains XML):* +----------+ +------------------+ +-----------------+ | Services | --> | XML .rdb files | --> | Text Editor | | (Config) | | (e.g., pyuno.rdb)| | (Human Read) | +----------+ +------------------+ +-----------------+ | v +----------------------------+ | Runtime Service Manager | | (Loads for Component Info) | + ----------------------------+ *Our Project**'s possible Cache Philosophy (Hybrid Approach):* +-----------------------+ +---------------------------+ | Static Data Sources | --> | Offline Tool | | (UNO APIs, Std Libs) | | (like unoidl-write + PoC) | +-----------------------+ +---------------------------+ | v +-----------------------+ | Binary Cache File | | (e.g., SQLite .db) | +-----------------------+ | v +------------------------------------+ | IDE Startup: Load Binary Cache | | into Master Analyzer (In-Memory) | +------------------------------------+ | v +------------------------------------+ | Dynamic Data (User Code, Vars) | | (Analyzed In-Memory by MA) | +------------------------------------+ | v +----------------------------------------+ | Live IDE Cache (In-Memory Hybrid) | | (Can be saved to XML/JSON for session) | +----------------------------------------+ *Our Project**'s Cache Philosophy - Considerations for a Hybrid Approach:* Drawing lessons from LibreOffice's RDB evolution, we can consider a hybrid cache design for our project. This would address performance needs while maintaining flexibility. *Potential Approaches:* - *The Core Static Cache (UNO APIs, Standard Libraries):* For this large amount of relatively stable data, we can consider storing it in a fast, compact, binary format. This could potentially use something like SQLite for efficient querying and retrieval. This is analogous to the new binary types.rdb format, aiming for quick IDE startup. - *The Dynamic Cache & User-Specific Data:* Information about the user's currently open modules, local variables, and editor state is highly dynamic. For debugging or saving the IDE's session state, a more readable format like JSON or XML could be beneficial. This is analogous to the XML services.rdb files. - *The Hybrid System Concept:* Our Master Analyzer would produce IdeSymbolInfo objects in memory. For persistence, we can consider options to: - Build an offline tool (similar to unoidl-write) using our PoC logic (theCoreReflection, BASIC parser) to generate a comprehensive binary cache file of all shippable UNO and Standard/ScriptForge library info, possibly using SQLite. This file would ideally ship with LibreOffice. - At runtime, the IDE would load this binary cache into memory. The Master Analyzer would then add to or overlay this cache with information from the user's open documents and unsaved changes. This live part might not need disk saving, or could be saved as XML/JSON for session state. *This ideated approach aims for:* - Optimized Startup Performance: >From loading a pre-compiled binary cache (e.g., SQLite). - Flexibility & Dynamicism: From in-memory analysis of live code. - Improved Debuggability: >From clear static/dynamic separation. [1] https://listarchives.libreoffice.org/global/dev/2013/msg14613.html [2] https://wiki.documentfoundation.org/Documentation/DevGuide/Extensions [3] https://docs.libreoffice.org/store.html [4] https://www.openoffice.org/udk/python/python-bridge.html [5] https://ask.libreoffice.org/t/no-helloworldpython-nor-any-other-python-script-using-appimage/107376/21 Week 4 mail chain - https://lists.freedesktop.org/archives/libreoffice/2025-June/093392.html I look forward to discussing these considerations and potential strategies with the community and specially with mentors On Tue, 17 Jun 2025 at 01:32, Stephan Bergmann < stephan.bergm...@allotropia.de> wrote: > On 6/16/25 18:37, Devansh Varshney wrote: > > *2. Legacy RDBs*: Interestingly, when I tried to run unoidl-read on some > > other RDBs from workdir/Rdb/ (like pyuno.rdb), I got a different > error: > > > > |$ unoidl-read $PWD/workdir/Rdb/pyuno.rdb Bad input <...>: cannot open > > legacy file: 6| > > > > This confirms the unoidl/README.md note that unoidl::Manager can > > detect the old legacy format but may not be able to read all of them with > > this specific tool. It's a great insight into the mixed-format nature of > the > > registry system. > Traditionally, the original store-based binary rdb format was used for > both "types" files (storing information about UNOIDL entities) and > "services" files (storing information about UNO components). Both those > kinds of rdb files have since been changed, using a different binary > format for the "types" files and an XML format for the "services" files. > Somewhat confusingly, all those kinds of files still use the ".rdb" > extension. > > unoidl-read can read "types" files (both the old and new binary > formats), but not "services" files (the XML format)---and > workdir/Rdb/pyuno.rdb is such a "services" file. > -- *Regards,* *Devansh*