Salvete!
I declare this scriptable and doable, just not by me, since I can't
programme me way out of a wet paper bag. (Well, I prolly can at gunpoint, but
yeah, that's what it would take.)
> So I have this idea I'd like to do for a hobby project, but it requires
> finding a table that lists a classic novel,
First I'm afraid you'll have to define how you're choosing to categorise
Classics. As someone charged with that task, it sucks and it's not as
straightforward as one might think. I'd encourage you to either lewt and
pillage someone else's preextant classification or pick summat easier for a
computer, like publication date or inclusion on one of those stuffy arse
bibliographies of the 100 Greatest Books. (Please do mull over how white
bespoke lists tend to be.)
> a Gutenberg.org link to an> instance of that work (first listed, one with
> most downloads, whichever),
> the lead female character, and the lead male character (can be null). E.g.
> Pride and Prejudice, http://www.gutenberg.org/ebooks/42671, Elizabeth
> Bennet, Mr. Darcy. Even leaving the Gutenberg part for another day, this
> has been really difficult to find.
>
Might I suggest having your scraper haphazardly search through 650 a fields
for the phrase "Fictitious Character"?
> I've had no success with Dbpedia/Wikidata since there's no real
> standardized format for novels, characters often are associated more
> strongly with films or video games than original works (Cheshire Cat), and
> when characters are listed they are neither prioritized nor link to a
> record that clearly states gender.
Thanks to the antiquated subject stuff that happens at dear olde LOC, also
picking through the data for "Women" should get you some gender data.
You might scoff, but I do like the lists of lists at Wikipedia in terms of
this hypothetical. For instance:
https://en.wikipedia.org/wiki/List_of_LGBT_characters_in_modern_written_fiction
could be quite helpful. One could have one's bot check on that page for
edits. Surely this is easier to sort than reinventing the wheel and being one
person against a sea of publishers.
https://en.wikipedia.org/wiki/Category:Lists_of_literary_characters
Were I you, I'd also be keen to hook in Open Library since closed
datakeepers have a nasty tendency of waking up and deciding to charge or lock
things away.
> And then there's how to select some sort> of "Western Canon" list. ISBNs are
> nowhere to be found, nor any other
> identifier that might help to corral a fair chunk of results.
>
> I looked at OCLC, but WorldCat Works is still an experiment and frankly
> looks like too much work to query for too little return even if it had good
> coverage. Amazon? Librarything? Goodreads? No luck yet.
>
Did you try Novelist if you must try the proprietary DB route? I really
think what you needs do is pick a good cataloguer's brain for a bit and come up
with a brute force script that will harvest stuff for you and autoupdate on RSS
or summat else since effort begins with eh. Your data set isn't infinite, it's
just not small. I wouldn't even properly call it large given how unrich and
less problematic text Library data is in comparison to say audio or video files.
Cheers,
Brooke