GitHub user arne-bdt added a comment to the discussion: CIM XML (IEC 61970-552) as Lang based on current RDF/XML parser (read-only at first)
The IEC CIM "profiles" are regular RDF Schema using regular `RDF/XML`. They describe classes, inheritance, properties, datatypes, relations etc. The datatypes are mostly special datatypes like "ActivePower" which typically can be mapped to a `XSDDatatype `like `XSDfloat`. `CIMXML` was planned to be a subset of `RDF/XML`. The idea was, that it needed to be readable as regular `XML`. For older CGMES releases, they provided XSD-Schema definitions to use, e.g. build parsers. `CIMXML` requires the syntax to be flat. Only resource + property is allowed, no nested resources like in `RDF/XML`. --> so _there is legal RDF/XML, which is not legal as CIMXML_. There is more: - `CIMXML` has a special syntax for URN forms where the prefix "urn:uuid:" is replaxed by "_". - For `DifferenceModels` (basically the additon and deletion of `Delta` in Jena) they introduced `rdf:parseType="Statements"` which should create a sub graph with the contained statements. --> so at least these `DifferenceModels` cannot be expressed as one graph but as multiple graphs. - CIMXML does not contain any `rdf:datatype`. The parser is supposed to enhance the data using the datatypes from the matching profile definitions. This requires to look into the "md:Model" header data and find the referenced "md:Model.profile"s. The old format UCTE, that is currently used to exchange Transport System Operator (TSO) grid data is basically an ASCII format for punch cards. One TSO grid today is between 600-900 kB ASCII. Roughly the the same data as CIM/CGMES is 40-60 MB in CIMXML. CIM/CGMES is able to describe the grid in much more detail, which is needed for e.g. redispatch and topology optimization. At this detail level, I expect the data for one grid to grow up to 140-250 MB of CIMXML. There are 35+ interconnected TSOs in Europe. For many processes like the coordinated security analysis all these grids are needed at once and not only for one timestamp but typically for 24h. So we have roughly 35 TSOs * 160 MB * 24h of `CIMXML` data, that we need to parse and then work with. These coordination process typically have limited time windows between 15-30 minutes for reading all the data over calculation to grid operator decisions. While we used `StreamRDF` in the past to fix each resource node and each typed literal node, this is slow and requires many new object creations. The cimxml parser fixes this and is optimized for performance. For DifferenceModels the problem was that the nested graphs were first parsed as XML literals and then needed to be parsed again as RDF/XML. The `cimxml` parser uses `aalto-xml `instead of `xerces`, which showed to be bit faster for most of the relevant larger graphs. On my machine, the Jena `RDFParser` takes about 40s to read `bsbm-25m.xml` into `GraphMem2Roaring `where the new parser takes only 30s. The average speed on my machine (13th Gen Intel(R) Core(TM) i9-13950HX, 2200 MHz, 24 Cores) now is about 0.7-0.83 million triples per second for parsing large RDF/XML or CIMXML files into `GraphMem2Roaring`. Many software projects around CIM/CGMES had bad experiences when they tried to work with `CIMXML` as RDF. I hope the cimxml parser might ease the pain for future projects in that area. This week I spoke about CGMES and CIMXML at the [LF Energy Summit 2025](https://events.linuxfoundation.org/lfenergysummit-europe/) in Aachen: [Breaking Down CGMES Barrieres](https://lfenergysummiteu2025.sched.com/event/26J6i/sponsored-session-breaking-down-cgmes-barriers-how-opencgmes-solves-real-world-grid-data-exchange-challenges-arne-bernhardt-soptim-ag?linkback=grid&iframe=no) GitHub link: https://github.com/apache/jena/discussions/2529#discussioncomment-14391592 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
