GitHub user afs added a comment to the discussion: How to load big dataset to new database
I'm having problems with PubChem - after downloading all the files (FTP, as described on the website) some of the files are corrupt gz files - about 5% of the files. Retrying usually gets a valid file but at least one needed 3 attempts. I also found some syntax errors but with the gz problems, it isn't yet clear whether they are related or whether the files really are illegal Turtle. Syntax errors are a nuisance when bulk loading. It's hard to know what is actually in the database and harder to find and fix it. Are you trying to load all 2065 files? I did see some with very long literals (the text of patent abstracts, IIRC). GitHub link: https://github.com/apache/jena/discussions/3701#discussioncomment-15525196 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
