Hey Alex,

they should be utf-8. I get the same error as you if not all of my bash
lang vars are set. Try to check with "locale" if all are set and do export
LC_ALL, etc with utf-8 if they are not.

Best,
Jo


On Fri, May 16, 2014 at 12:55 PM, Alex Olieman <[email protected]> wrote:

> Hi,
>
> Lately I've been tinkering with the raw data to see if I can create a
> model with a filtered set of disambiguation candidates, and that spots
> more lowercase surface forms. When I train the model on the modified
> data, however, I'm faced with character encoding issues.
>
> For example:
> >  INFO 2014-05-16 03:16:29,310 main [WikipediaToDBpediaClosure] - Done.
> >  INFO 2014-05-16 03:16:29,388 main [SurfaceFormSource$] - Creating
> > SurfaceFormSource...
> >  INFO 2014-05-16 03:16:29,388 main [SurfaceFormSource$] - Reading
> > annotated and total counts...
> > Exception in thread "main"
> > java.nio.charset.UnmappableCharacterException: Input length = 1
> >         at java.nio.charset.CoderResult.throwException(Unknown Source)
> >         at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
>
> I had expected these files to be encoded in utf-8, but it looks like
> this isn't the case. The chardet library tells me it is ISO-8859-2
> a.k.a. Latin-2 instead. Can someone tell me in which character encoding
> the raw data (pig output) files should be for db.CreateSpotlightModel to
> read them correctly? If this really should be one of the "Western"
> character sets, I would expect it to be Latin-1 instead.
>
> Best,
> Alex
>
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform
> available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Dbp-spotlight-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to