Am 13.02.2013 19:12, schrieb Daniel Naber:
> On 13.02.2013, 18:04:53 Stephan Hennig wrote:
> 
>> Is it save to send text as plain UTF-8 or do non ASCII characters in
>> text still need to be URL (percent) encoded for the transfer?
> 
> As you should use POST anyway (not GET), there should be no need to URL-
> encode the actual text.

Looks like the ampersand needs to be URL encoded, too.

But HTML entities seem to be a harder problem.  Attached are the results
of five invocations of LanguageTool with the following texts:

  text                errors

  a > word            0
  a > wort            1
  a > wort         0 !
  a & wort            0 !
  a %26gt; wort       2 !

What's the best way to handle text that has HTML entities as subject?
Unconditionally replacing HTML entities by the corresponding UTF-8
character before feeding text to LanguageTool?  Even if I don't know if
they are meant literally?

Best regards,
Stephan Hennig


> $ curl --data "language=en-GB&text=a > word" https://languagetool.org:8081
> <?xml version="1.0" encoding="UTF-8"?>
> <matches software="LanguageTool" version="2.1-SNAPSHOT" buildDate="2013-01-31 
> 00
> :00">
> <language shortname="en-GB" name="English (GB)"/>
> </matches>
> 
> $ curl --data "language=en-GB&text=a > wort" https://languagetool.org:8081
> <?xml version="1.0" encoding="UTF-8"?>
> <matches software="LanguageTool" version="2.1-SNAPSHOT" buildDate="2013-01-31 
> 00
> :00">
> <language shortname="en-GB" name="English (GB)"/>
> <error fromy="0" fromx="4" toy="0" tox="8" ruleId="MORFOLOGIK_RULE_EN_GB" 
> msg="P
> ossible spelling mistake found" 
> replacements="Mort#fort#port#sort#tort#wart#wont
> #word#wore#work#worm#worn#worst#worth" context="a &gt; wort" 
> contextoffset="4" o
> ffset="4" errorlength="4" category="Possible Typo" 
> locqualityissuetype="misspell
> ing"/>
> </matches>
> 
> $ curl --data "language=en-GB&text=a &gt; wort" https://languagetool.org:8081
> <?xml version="1.0" encoding="UTF-8"?>
> <matches software="LanguageTool" version="2.1-SNAPSHOT" buildDate="2013-01-31 
> 00
> :00">
> <language shortname="en-GB" name="English (GB)"/>
> </matches>
> 
> $ curl --data "language=en-GB&text=a & wort" https://languagetool.org:8081
> <?xml version="1.0" encoding="UTF-8"?>
> <matches software="LanguageTool" version="2.1-SNAPSHOT" buildDate="2013-01-31 
> 00
> :00">
> <language shortname="en-GB" name="English (GB)"/>
> </matches>
>
> $ curl --data "language=en-GB&text=a %26gt; wort" 
> https://languagetool.org:8081
> <?xml version="1.0" encoding="UTF-8"?>
> <matches software="LanguageTool" version="2.1-SNAPSHOT" buildDate="2013-01-31 
> 00
> :00">
> <language shortname="en-GB" name="English (GB)"/>
> <error fromy="0" fromx="2" toy="0" tox="5" ruleId="MORFOLOGIK_RULE_EN_GB" 
> msg="P
> ossible spelling mistake found" replacements="Sgt#hgt" context="a &amp;gt; 
> wort"
>  contextoffset="2" offset="2" errorlength="3" category="Possible Typo" 
> locqualit
> yissuetype="misspelling"/>
> <error fromy="0" fromx="7" toy="0" tox="11" ruleId="MORFOLOGIK_RULE_EN_GB" 
> msg="
> Possible spelling mistake found" 
> replacements="Mort#fort#port#sort#tort#wart#won
> t#word#wore#work#worm#worn#worst#worth" context="a &amp;gt; wort" 
> contextoffset=
> "7" offset="7" errorlength="4" category="Possible Typo" 
> locqualityissuetype="mis
> spelling"/>
> </matches>

------------------------------------------------------------------------------
The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, 
is your hub for all things parallel software development, from weekly thought 
leadership blogs to news, videos, case studies, tutorials, tech docs, 
whitepapers, evaluation guides, and opinion stories. Check out the most 
recent posts - join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to