Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

2024-04-15 Thread David Cook via Koha-devel
I’m just finishing up the next version of the plugin which finds problem bib 
records, problem item records, and lets you fix 1 bib record and its X problem 
item records at a time.

 

I’ll be putting it up online sometime in the next couple weeks. (I’m away a lot 
these couple of weeks, so tough to give a firm time.)

 

But I’m about to use it to find problem item records in one of my systems…

 

David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

From: Tomas Cohen Arazi  
Sent: Sunday, 14 April 2024 2:49 AM
To: Magnus Enger 
Cc: David Cook ; koha-devel@lists.koha-community.org
Subject: Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

 

Yeah, dump it somewhere and we send pull requests to add missing bits

 

El El vie, 12 abr 2024 a la(s) 9:50 a. m., Magnus Enger via Koha-devel 
mailto:koha-devel@lists.koha-community.org> > escribió:

Den 12.04.2024 08:24, skrev David Cook:
> Not yet. At some point I need to get better at sharing Koha plugins more 
> widely.

Just dump it to Github or similar, and set the version to 0.0.1. ;-) 
Release early, release often.

Best regards,
Magnus
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org 
<mailto:Koha-devel@lists.koha-community.org> 
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/

___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/


Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

2024-04-12 Thread Philippe Blouin via Koha-devel

Something else to add to search_for_data_inconsistencies.pl ?

I like perl-based solution, and I appreciate centralized ones, even 
though I suppose what you're testing is not an "inconsistency".


Logo inLibro Philippe Blouin
Directeur de la technologie

T 833-INLIBRO (465-4276) , poste 230
C philippe.blo...@inlibro.com

www.inLibro.com 

On 2024-04-11 21:36, David Cook via Koha-devel wrote:


Hi all,

I just wanted to share a (MariaDB) SQL report that I wrote for finding 
bib records with invalid XML characters:


select biblionumber from biblio_metadata where metadata REGEXP 
'[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{1}-\\x{10}]+';


Newer versions of Koha strip invalid character from the XML so that 
you can fix your records. I figure this report is very valuable when 
coupled with that functionality. In fact, I just advised a library 
today to use them together to fix up some bad data in their catalogue.


--

On a related note, I’ve noticed that you can have a record with good 
bib XML but invalid item XML, and you won’t notice until your record 
fails to be indexed. So I’m planning on writing a report for that too.


I’m thinking it might be good to add these reports to core Koha, so 
that people can find and fix their own metadata problems. What do 
people think?


David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia

Office: 02 9212 0899

Online: 02 8005 0595


___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website :https://www.koha-community.org/
git :https://git.koha-community.org/
bugs :https://bugs.koha-community.org/___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/


Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

2024-04-12 Thread Magnus Enger via Koha-devel

Den 12.04.2024 08:24, skrev David Cook:

Not yet. At some point I need to get better at sharing Koha plugins more widely.


Just dump it to Github or similar, and set the version to 0.0.1. ;-) 
Release early, release often.


Best regards,
Magnus
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/


Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

2024-04-12 Thread David Cook via Koha-devel
Not yet. At some point I need to get better at sharing Koha plugins more 
widely. 

David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia

Office: 02 9212 0899
Online: 02 8005 0595

-Original Message-
From: Koha-devel  On Behalf Of 
Magnus Enger via Koha-devel
Sent: Friday, 12 April 2024 4:09 PM
To: koha-devel@lists.koha-community.org
Subject: Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL



Den 12.04.2024 04:13, skrev David Cook via Koha-devel:
> I have a RepairRecord plugin, so I might do a version in that first, 
> and if that goes well I could look at upstreaming a patch…

Intriguing! :-) Is it available somewhere?

Best regards,
Magnus
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/ git : https://git.koha-community.org/ 
bugs : https://bugs.koha-community.org/

___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/


Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

2024-04-12 Thread David Cook via Koha-devel
Yeah, I was thinking it would be good to add to the SQL Report Library. I've 
just been flat out today... (but wanted to make sure I shared it with you folk 
at least)

David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia

Office: 02 9212 0899
Online: 02 8005 0595

-Original Message-
From: Koha-devel  On Behalf Of 
Magnus Enger via Koha-devel
Sent: Friday, 12 April 2024 4:06 PM
To: koha-devel@lists.koha-community.org
Subject: Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

Hi!

Den 12.04.2024 03:36, skrev David Cook via Koha-devel:
> Hi all,
> 
> I just wanted to share a (MariaDB) SQL report that I wrote for finding 
> bib records with invalid XML characters:
> 
> select biblionumber from biblio_metadata where metadata REGEXP 
> '[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\
> x{1}-\\x{10}]+';
> 
> Newer versions of Koha strip invalid character from the XML so that 
> you can fix your records. I figure this report is very valuable when 
> coupled with that functionality. In fact, I just advised a library 
> today to use them together to fix up some bad data in their catalogue.
> 
> --
> 
> On a related note, I’ve noticed that you can have a record with good 
> bib XML but invalid item XML, and you won’t notice until your record 
> fails to be indexed. So I’m planning on writing a report for that too.
> 
> I’m thinking it might be good to add these reports to core Koha, so 
> that people can find and fix their own metadata problems. What do people 
> think?

Sounds like an excellent idea! Sounds kind of similar to "MARC bibliographic 
framework test" at /cgi-bin/koha/admin/checkmarc.pl

The report could also be added to
https://wiki.koha-community.org/wiki/SQL_Reports_Library for older Kohas and to 
be immediately useful.

Best regards,
Magnus
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/ git : https://git.koha-community.org/ 
bugs : https://bugs.koha-community.org/

___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/


Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

2024-04-12 Thread Magnus Enger via Koha-devel



Den 12.04.2024 04:13, skrev David Cook via Koha-devel:
I have a RepairRecord plugin, so I might do a version in that first, and 
if that goes well I could look at upstreaming a patch…


Intriguing! :-) Is it available somewhere?

Best regards,
Magnus
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/


Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

2024-04-12 Thread Magnus Enger via Koha-devel

Hi!

Den 12.04.2024 03:36, skrev David Cook via Koha-devel:

Hi all,

I just wanted to share a (MariaDB) SQL report that I wrote for finding 
bib records with invalid XML characters:


select biblionumber from biblio_metadata where metadata REGEXP 
'[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{1}-\\x{10}]+';


Newer versions of Koha strip invalid character from the XML so that you 
can fix your records. I figure this report is very valuable when coupled 
with that functionality. In fact, I just advised a library today to use 
them together to fix up some bad data in their catalogue.


--

On a related note, I’ve noticed that you can have a record with good bib 
XML but invalid item XML, and you won’t notice until your record fails 
to be indexed. So I’m planning on writing a report for that too.


I’m thinking it might be good to add these reports to core Koha, so that 
people can find and fix their own metadata problems. What do people think?


Sounds like an excellent idea! Sounds kind of similar to "MARC 
bibliographic framework test" at /cgi-bin/koha/admin/checkmarc.pl


The report could also be added to 
https://wiki.koha-community.org/wiki/SQL_Reports_Library for older Kohas 
and to be immediately useful.


Best regards,
Magnus
___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/


Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

2024-04-12 Thread Marcel de Rooy via Koha-devel
+1

Van: Koha-devel  namens David Cook 
via Koha-devel 
Verzonden: vrijdag 12 april 2024 03:36
Aan: 'Koha-devel' 
Onderwerp: [Koha-devel] Finding invalid XML characters in Koha data via SQL


Hi all,



I just wanted to share a (MariaDB) SQL report that I wrote for finding bib 
records with invalid XML characters:

select biblionumber from biblio_metadata where metadata REGEXP 
'[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{1}-\\x{10}]+';



Newer versions of Koha strip invalid character from the XML so that you can fix 
your records. I figure this report is very valuable when coupled with that 
functionality. In fact, I just advised a library today to use them together to 
fix up some bad data in their catalogue.



--



On a related note, I’ve noticed that you can have a record with good bib XML 
but invalid item XML, and you won’t notice until your record fails to be 
indexed. So I’m planning on writing a report for that too.



I’m thinking it might be good to add these reports to core Koha, so that people 
can find and fix their own metadata problems. What do people think?



David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia



Office: 02 9212 0899

Online: 02 8005 0595


___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/


Re: [Koha-devel] Finding invalid XML characters in Koha data via SQL

2024-04-11 Thread David Cook via Koha-devel
Alas, I couldn't think of a really clever way of doing the items table, so I
think it'll need a Perl-based solution.

 

I have a RepairRecord plugin, so I might do a version in that first, and if
that goes well I could look at upstreaming a patch.

 

David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

From: Koha-devel  On Behalf Of
David Cook via Koha-devel
Sent: Friday, 12 April 2024 11:36 AM
To: 'Koha-devel' 
Subject: [Koha-devel] Finding invalid XML characters in Koha data via SQL

 

Hi all,

 

I just wanted to share a (MariaDB) SQL report that I wrote for finding bib
records with invalid XML characters:

select biblionumber from biblio_metadata where metadata REGEXP
'[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{1000
0}-\\x{10}]+';

 

Newer versions of Koha strip invalid character from the XML so that you can
fix your records. I figure this report is very valuable when coupled with
that functionality. In fact, I just advised a library today to use them
together to fix up some bad data in their catalogue.

 

--

 

On a related note, I've noticed that you can have a record with good bib XML
but invalid item XML, and you won't notice until your record fails to be
indexed. So I'm planning on writing a report for that too. 

 

I'm thinking it might be good to add these reports to core Koha, so that
people can find and fix their own metadata problems. What do people think?

 

David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

___
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/