http://git.wikimedia.org/blob/mediawiki%2Fextensions%2FTextExtracts.git/master/ExtractFormatter.php function getFirstSentences.
On Sun, Oct 5, 2014 at 11:16 AM, Kristian Kankainen <[email protected]> wrote: > Thank you. > I found the git for the code. Could you maybe tell me a bit more exactly > where the sentence algorithm is (or is not but should be). I might take a > look at it some day next week. The name of the relevant function and > filename or something like that for pointers. > > Kristian > > > 05.10.2014 19:31, Max Semenik kirjutas: > > Sentence handling algorithm appears to suck at HTML handling. I've filed a > bug for it: https://bugzilla.wikimedia.org/show_bug.cgi?id=71671 As a > workaround, try plaintext extracts: > https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategories&exsentences=1&explaintext&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=&maxlag=10&titles=j%C3%A4rv > or switch from requesting a number of sentences to a number of characters. > > On Sun, Oct 5, 2014 at 8:34 AM, Kristian Kankainen <[email protected]> > wrote: > >> Hi! >> >> When I query the Estonian Wikipedia's Web API for the article's first >> sentence, I sometimes get empty response. Actually it gives back an >> horizontal rule and thats it. >> >> For example: >> >> https://et.wikipedia.org/w/api.php?action=query&prop=extracts|categories&exsentences=1&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=&maxlag=10&titles=järv >> >> gives only an horizontal rule as the extract: >> "extract": "<hr />", >> >> Can anyone say what is happening here. Is the article's source organized >> in a wrong way or is it a problem on the APIs sentence parser side? >> >> Best regards >> Kristian Kankainen >> >> _______________________________________________ >> Mediawiki-api mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >> > > > > -- > Best regards, > Max Semenik ([[User:MaxSem]]) > > > _______________________________________________ > Mediawiki-api mailing > [email protected]https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > > > > _______________________________________________ > Mediawiki-api mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > > -- Best regards, Max Semenik ([[User:MaxSem]])
_______________________________________________ Mediawiki-api mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
