GitHub user afs added a comment to the discussion: How to load big dataset to 
new database

I'm having problems with PubChem - after downloading all the files (FTP, as 
described on the website) some of the files are corrupt gz files - about 5% of 
the files. Retrying usually gets a valid file but at least one needed 3 
attempts.

I also found some syntax errors but with the gz problems, it isn't yet clear 
whether they are related or whether the files really are illegal Turtle.

Syntax errors are a nuisance when bulk loading. It's hard to know what is 
actually in the database and harder to find and fix it.

Are you trying to load all 2065 files?
I did see some with very long literals (the text of patent abstracts, IIRC).


GitHub link: 
https://github.com/apache/jena/discussions/3701#discussioncomment-15525196

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to