Re: GSoC 25: BASIC IDE - Insights from Data Discovery & C++ PoCs [WEEK 4]

Devansh Varshney Tue, 17 Jun 2025 06:20:38 -0700

Hi Everyone,

Thank you Stephan. After your mail, I looked into this more.
Here's what I 
found:https://cgit.freedesktop.org/libreoffice/core/log/unoidl?h=libreoffice-4.2.8.2&id=72b8e929af5bcfb7d17a74de636fb1ef5204297b&showmsg=1
(This is in reverse chronological order.)


commit -        320571bf701a092d0f2d15fd4589ae271802a03f

The cgit logs from 2013, primarily by Stephan Bergmann, document a massive
refactoring of LibreOffice's core UNO infrastructure. *The "Old World"
(Legacy, pre-2013):* LibreOffice inherited a system from OpenOffice.org
that used a toolchain of idlc, regmerge, and regview. - .idl files compiled
by idlc into binary .urd (UNO Reflection Data). - regmerge then combined
these .urd files into a large, complex, legacy-format binary registry file
(.rdb). - regview was the tool designed to read this old format. *The "New
World" (The 2013 unoidl Refactoring):* - Goal: Replace the old, cumbersome
system with something more modern, efficient, and easier to maintain. -
Solution: The unoidl module was created to be the central authority for
handling UNO type information. *The New Tools:* - *unoidl-write:* Replaced
idlc and regmerge. It compiles .idl files directly into the new, more
efficient binary format. - *unoidl-read:* Replaced regview. Its specific
purpose is to read the new .rdb files and dump their contents in a
human-readable,
IDL-like format. - *unoidl-check*: Replaced regcompare for API
compatibility. *What this history tells us:* - *Why regview Failed:* My
initial PoC attempts to use regview on a modern LibreOffice build failed
because I was using a legacy tool on new files. We were trying to read a Blu
-ray with a VHS player. - *The Correct Tool:* This confirms that unoidl-read
is the correct, modern tool for our goal of getting a static dump of the
UNO API types. - RDBs are the Compiled Truth: It shows that the .rdb files
are the canonical, compiled source of truth for UNO types. *Key Commits
that Tell the Story:* - "WIP: Experimental new binary type.rdb format"
(Stephan Bergmann, Feb/Mar/Apr 2013): Documents new RDB format and unoidl
module creation. Explicitly aimed to "ultimately remove modules store and
registry." - "New unoidl-read tool to translate registries into readable
.idl files" (Stephan Bergmann, Sep 2013): Introduces our primary static
dumping tool. - "New unoidl-check tool to replace regcompare" (Stephan
Bergmann, Sep 2013): This further shows the entire legacy toolchain
(regcompare, regmerge, idlc, regview) was being systematically replaced by
a new, unified unoidl-* toolset. - "Revert 'WIP: Experimental new binary
type.rdb format'" (multiple times): The log shows this was a complex
transition with reverts and re-applications. This is normal for a change of
this magnitude and explains why the legacy code and new code had to co-exist.
*Why XML for "Services" RDBs?* It turns out services.rdb files are
intentionally kept in XML, not by accident. Here's why: 1. Legacy support &
human readability: While .rdb files for types switched to new binary,
service registries remained XML "for backwards compatibility" [1
<https://listarchives.libreoffice.org/global/dev/2013/msg14613.html>, 2
<https://wiki.documentfoundation.org/Documentation/DevGuide/Extensions>, 3
<https://docs.libreoffice.org/store.html>]. Human-readable XML eases
maintenance, debugging, and scripting. 2. Consistent tooling across UNO
bridges: Developers noted that `program/services` .rdb files are XML-based.
The Python-UNO bridge, for instance, depends on those XML service
definitions; binary would hinder Python tools [4
<https://www.openoffice.org/udk/python/python-bridge.html>, 5
<https://ask.libreoffice.org/t/no-helloworldpython-nor-any-other-python-script-using-appimage/107376/21>].


3. Consistency with ODF: LibreOffice's file formats (ODT, ODS)
   are XML-based (ODF). Keeping service registration in XML
   aligns with this broader architectural philosophy.


*TL;DR:* *File Type* *Format* *Reason* types.rdb Binary Efficient, compact,
new unoidl-write toolchain services.rdb XML Human-readable, backwards
compatibility, supports scripting tools like Python-UNO.

So, the "mixed-format" nature of the registry is a deliberate and
pragmatic design choice, balancing performance (for binary types)with
interoperability and maintainability (for XML services).

*Flowchart: Evolution of UNO RDBs* *Old World (Pre-2013) UNO Type
Processing:* +----------+ +--------+ +---------+ | .idl | --> | idlc | -->
| .urd | | (Source) | | | | (Binary)| +----------+ +--------+ +---------+ |
v +----------+ +-----------+ | regmerge | --> | Legacy | | | | .rdb | | | |
(Binary) | +----------+ +-----------+ | v +---------+ | regview | | (Dump) |
+---------+ *New World (2013 Refactoring) UNO Type Processing:* +----------+
+--------------+ +----------+ | .idl | --> | unoidl-write | --> | New | |
(Source) | | | | .rdb | +----------+ +--------------+ | (Binary) | +
----------+ | v +------------+ | unoidl-read| | (Dump) |
+------------+ *Special
Case: UNO Service Processing (Remains XML):* +----------+
+------------------+ +-----------------+ | Services | --> | XML .rdb files
| --> | Text Editor | | (Config) | | (e.g., pyuno.rdb)| | (Human Read)
| +----------+
+------------------+ +-----------------+ | v +----------------------------+
| Runtime Service Manager | | (Loads for Component Info) | +
----------------------------+

*Our Project**'s possible Cache Philosophy (Hybrid Approach):*

+-----------------------+     +---------------------------+
| Static Data Sources   | --> |  Offline Tool             |
| (UNO APIs, Std Libs)  |     | (like unoidl-write + PoC) |
+-----------------------+     +---------------------------+
                                        |
                                        v
                               +-----------------------+
                               | Binary Cache File     |
                               | (e.g., SQLite .db)    |
                               +-----------------------+
                                        |
                                        v
                       +------------------------------------+
                       | IDE Startup: Load Binary Cache     |
                       | into Master Analyzer (In-Memory)   |
                       +------------------------------------+
                                        |
                                        v
                       +------------------------------------+
                       | Dynamic Data (User Code, Vars)     |
                       | (Analyzed In-Memory by MA)         |
                       +------------------------------------+
                                        |
                                        v
                       +----------------------------------------+
                       |    Live IDE Cache (In-Memory Hybrid)   |
                       | (Can be saved to XML/JSON for session) |
                       +----------------------------------------+



*Our Project**'s Cache Philosophy - Considerations for a Hybrid Approach:*
Drawing lessons from LibreOffice's RDB evolution, we can consider a hybrid
cache design for our project. This would address performance needs while
maintaining flexibility. *Potential Approaches:* - *The Core Static Cache
(UNO APIs, Standard Libraries):* For this large amount of relatively stable
data, we can consider storing it in a fast, compact, binary format. This
could potentially use something like SQLite for efficient querying and
retrieval. This is analogous to the new binary types.rdb format, aiming for
quick IDE startup. - *The Dynamic Cache & User-Specific Data:* Information
about the user's currently open modules, local variables, and editor state
is highly dynamic. For debugging or saving the IDE's session state, a more
readable format like JSON or XML could be beneficial. This is analogous to
the XML services.rdb files. - *The Hybrid System Concept:* Our Master
Analyzer would produce IdeSymbolInfo objects in memory. For persistence, we
can consider options to: - Build an offline tool (similar to unoidl-write)
using our PoC logic (theCoreReflection, BASIC parser) to generate a
comprehensive binary cache file of all shippable UNO and Standard/ScriptForge
library info, possibly using SQLite. This file would ideally ship with
LibreOffice. - At runtime, the IDE would load this binary cache into
memory. The Master Analyzer would then add to or overlay this cache with
information from the user's open documents and unsaved changes. This live
part might not need disk saving, or could be saved as XML/JSON for session
state. *This ideated approach aims for:* - Optimized Startup Performance:
>From loading a pre-compiled binary cache (e.g., SQLite). - Flexibility &
Dynamicism: From in-memory analysis of live code. - Improved Debuggability:
>From clear static/dynamic separation. [1]
https://listarchives.libreoffice.org/global/dev/2013/msg14613.html [2]
https://wiki.documentfoundation.org/Documentation/DevGuide/Extensions [3]
https://docs.libreoffice.org/store.html [4]
https://www.openoffice.org/udk/python/python-bridge.html [5]
https://ask.libreoffice.org/t/no-helloworldpython-nor-any-other-python-script-using-appimage/107376/21

    Week 4 mail chain -
https://lists.freedesktop.org/archives/libreoffice/2025-June/093392.html


I look forward to discussing these considerations and
potential strategies with the community and specially with mentors



On Tue, 17 Jun 2025 at 01:32, Stephan Bergmann <
stephan.bergm...@allotropia.de> wrote:

> On 6/16/25 18:37, Devansh Varshney wrote:
> > *2. Legacy RDBs*: Interestingly, when I tried to run unoidl-read on some
> >      other RDBs from workdir/Rdb/ (like pyuno.rdb), I got a different
> error:
> >
> > |$ unoidl-read $PWD/workdir/Rdb/pyuno.rdb Bad input <...>: cannot open
> > legacy file: 6|
> >
> > This confirms the unoidl/README.md note that unoidl::Manager can
> > detect the old legacy format but may not be able to read all of them with
> > this specific tool. It's a great insight into the mixed-format nature of
> the
> > registry system.
> Traditionally, the original store-based binary rdb format was used for
> both "types" files (storing information about UNOIDL entities) and
> "services" files (storing information about UNO components).  Both those
> kinds of rdb files have since been changed, using a different binary
> format for the "types" files and an XML format for the "services" files.
>   Somewhat confusingly, all those kinds of files still use the ".rdb"
> extension.
>
> unoidl-read can read "types" files (both the old and new binary
> formats), but not "services" files (the XML format)---and
> workdir/Rdb/pyuno.rdb is such a "services" file.
>


-- 
*Regards,*
*Devansh*

Re: GSoC 25: BASIC IDE - Insights from Data Discovery & C++ PoCs [WEEK 4]

Reply via email to