GitHub user arne-bdt edited a comment on the discussion: CIM XML (IEC 
61970-552) as Lang based on current RDF/XML parser (read-only at first)

The IEC CIM "profiles" are regular RDF Schema using regular `RDF/XML`.
They describe classes, inheritance, properties, datatypes, relations etc.
The datatypes are mostly special datatypes like "ActivePower" which typically 
can be mapped to a `XSDDatatype` like `XSDfloat`.

`CIMXML` was planned to be a subset of `RDF/XML`. The idea was, that it needed 
to be readable as regular `XML`. For older CGMES releases, they provided 
XSD-Schema definitions to use, e.g. to build parsers.
`CIMXML` requires the syntax to be flat. Only resource + property is allowed, 
no nested resources like in `RDF/XML`.
--> so _there is legal RDF/XML, which is not legal as CIMXML_.

There is more:
- `CIMXML` has a special syntax for URN forms where the prefix "urn:uuid:" is 
replaxed by "_".
- For `DifferenceModels` (basically the additon and deletion of `Delta` in 
Jena) they introduced `rdf:parseType="Statements"` which should create a sub 
graph with the contained statements.
--> so at least these `DifferenceModels` cannot be expressed as one graph but 
as multiple graphs.
- CIMXML does not contain any `rdf:datatype`. The parser is supposed to enhance 
the data using the datatypes from the matching profile definitions. This 
requires to look into the "md:Model" header data and find the referenced 
"md:Model.profile"s.

The old format UCTE, that is currently used to exchange Transport System 
Operator (TSO) grid data is basically an ASCII format for punch cards. One TSO 
grid today is between 600-900 kB ASCII. Roughly the the same data as CIM/CGMES 
is 40-60 MB in CIMXML. 
CIM/CGMES is able to describe the grid in much more detail, which is needed for 
e.g. redispatch and topology optimization. At this detail level, I expect the 
data for one grid to grow up to 140-250 MB of CIMXML.
There are 35+ interconnected TSOs in Europe. For many processes like the 
coordinated security analysis all these grids are needed at once and not only 
for one timestamp but typically for 24h. 
So we have roughly 35 TSOs * 160 MB * 24h of `CIMXML` data, that we need to 
parse and then work with. These coordination process typically have limited 
time windows between 15-30 minutes for reading all the data over calculation to 
grid operator decisions.

While we used `StreamRDF` in the past to fix each resource node and each typed 
literal node, this is slow and requires many new object creations. The cimxml 
parser fixes this and is optimized for performance.
For DifferenceModels the problem was that the nested graphs were first parsed 
as XML literals and then needed to be parsed again as RDF/XML.
The `cimxml` parser uses `aalto-xml `instead of `xerces`, which showed to be 
bit faster for most of the relevant larger graphs.
On my machine, the Jena `RDFParser` takes about 40s to read `bsbm-25m.xml` into 
`GraphMem2Roaring `where the new parser takes only 30s.
The average speed on my machine (13th Gen Intel(R) Core(TM) i9-13950HX, 2200 
MHz, 24 Cores) now is about 0.7-0.83 million triples per second for parsing 
large RDF/XML or CIMXML files into `GraphMem2Roaring`. 

Many software projects around CIM/CGMES had bad experiences when they tried to 
work with `CIMXML` as RDF. I hope the cimxml parser might ease the pain for 
future projects in that area.
This week I spoke about CGMES and CIMXML at the [LF Energy Summit 
2025](https://events.linuxfoundation.org/lfenergysummit-europe/) in Aachen: 
[Breaking Down CGMES 
Barrieres](https://lfenergysummiteu2025.sched.com/event/26J6i/sponsored-session-breaking-down-cgmes-barriers-how-opencgmes-solves-real-world-grid-data-exchange-challenges-arne-bernhardt-soptim-ag?linkback=grid&iframe=no)



GitHub link: 
https://github.com/apache/jena/discussions/2529#discussioncomment-14391592

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to