On Wed, Jul 28, 2010 at 7:10 AM, Glenn Fowler <[email protected]> wrote:
>
> On Tue, 27 Jul 2010 22:58:16 -0400 Finnbarr Murphy wrote:
>> I notice two things about the DSS XML dss.tst
>
>>   -  The embedded XML is well-formed but not valid.  It does not have a roo=
>> t element.
>>   -  C is not a required encoding for an XML processor.  UTF-8 and UTF-16 a=
>> re.  In
>>      fact they are the only required encodings in the XML/XSL group of spec=
>> ifications.  =20
>>      For this reason many people only use these encodings.
>
>> Can dss handle UTF-8 and UTF-16 encodings?
>
> the xml data in the data subdir was taken from public twitter feeds
> please point out the invalid parts
>
> the twitter data is tagged
>        <?xml version="1.0" encoding="UTF-8"?>
> so what do you mean by "C encoding"
>
> at this point we are not concerned with UTF-16 data
> are there many sources of UTF-16 data?

Many Chinese, Japanese and Korean web pages either use UTF-16 or (for
PRC) use GB18030.

Irek
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to