http://git.wikimedia.org/blob/mediawiki%2Fextensions%2FTextExtracts.git/master/ExtractFormatter.php
function  getFirstSentences.

On Sun, Oct 5, 2014 at 11:16 AM, Kristian Kankainen <[email protected]> wrote:

>  Thank you.
> I found the git for the code. Could you maybe tell me a bit more exactly
> where the sentence algorithm is (or is not but should be). I might take a
> look at it some day next week. The name of the relevant function and
> filename or something like that for pointers.
>
> Kristian
>
>
> 05.10.2014 19:31, Max Semenik kirjutas:
>
> Sentence handling algorithm appears to suck at HTML handling. I've filed a
> bug for it: https://bugzilla.wikimedia.org/show_bug.cgi?id=71671 As a
> workaround, try plaintext extracts:
> https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategories&exsentences=1&explaintext&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=&maxlag=10&titles=j%C3%A4rv
> or switch from requesting a number of sentences to a number of characters.
>
> On Sun, Oct 5, 2014 at 8:34 AM, Kristian Kankainen <[email protected]>
> wrote:
>
>> Hi!
>>
>> When I query the Estonian Wikipedia's Web API for the article's first
>> sentence, I sometimes get empty response. Actually it gives back an
>> horizontal rule and thats it.
>>
>> For example:
>>
>> https://et.wikipedia.org/w/api.php?action=query&prop=extracts|categories&exsentences=1&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=&maxlag=10&titles=järv
>>
>> gives only an horizontal rule as the extract:
>> "extract": "<hr />",
>>
>> Can anyone say what is happening here. Is the article's source organized
>> in a wrong way or is it a problem on the APIs sentence parser side?
>>
>> Best regards
>> Kristian Kankainen
>>
>> _______________________________________________
>> Mediawiki-api mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>
>
>
>
>  --
> Best regards,
> Max Semenik ([[User:MaxSem]])
>
>
> _______________________________________________
> Mediawiki-api mailing 
> [email protected]https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
>
>
> _______________________________________________
> Mediawiki-api mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
>


-- 
Best regards,
Max Semenik ([[User:MaxSem]])
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to