Thank you.
I found the git for the code. Could you maybe tell me a bit more exactly where the sentence algorithm is (or is not but should be). I might take a look at it some day next week. The name of the relevant function and filename or something like that for pointers.

Kristian


05.10.2014 19:31, Max Semenik kirjutas:
Sentence handling algorithm appears to suck at HTML handling. I've filed a bug for it: https://bugzilla.wikimedia.org/show_bug.cgi?id=71671 As a workaround, try plaintext extracts: https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategories&exsentences=1&explaintext&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=&maxlag=10&titles=j%C3%A4rv or switch from requesting a number of sentences to a number of characters.

On Sun, Oct 5, 2014 at 8:34 AM, Kristian Kankainen <[email protected] <mailto:[email protected]>> wrote:

    Hi!

    When I query the Estonian Wikipedia's Web API for the article's
    first sentence, I sometimes get empty response. Actually it gives
    back an horizontal rule and thats it.

    For example:
    
https://et.wikipedia.org/w/api.php?action=query&prop=extracts|categories&exsentences=1&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=&maxlag=10&titles=järv
    
<https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategories&exsentences=1&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=&maxlag=10&titles=j%C3%A4rv>

    gives only an horizontal rule as the extract:
    "extract": "<hr />",

    Can anyone say what is happening here. Is the article's source
    organized in a wrong way or is it a problem on the APIs sentence
    parser side?

    Best regards
    Kristian Kankainen

    _______________________________________________
    Mediawiki-api mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.wikimedia.org/mailman/listinfo/mediawiki-api




--
Best regards,
Max Semenik ([[User:MaxSem]])


_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to