On Wed, Jul 28, 2010 at 7:10 AM, Glenn Fowler <[email protected]> wrote: > > On Tue, 27 Jul 2010 22:58:16 -0400 Finnbarr Murphy wrote: >> I notice two things about the DSS XML dss.tst > >> - The embedded XML is well-formed but not valid. It does not have a roo= >> t element. >> - C is not a required encoding for an XML processor. UTF-8 and UTF-16 a= >> re. In >> fact they are the only required encodings in the XML/XSL group of spec= >> ifications. =20 >> For this reason many people only use these encodings. > >> Can dss handle UTF-8 and UTF-16 encodings? > > the xml data in the data subdir was taken from public twitter feeds > please point out the invalid parts > > the twitter data is tagged > <?xml version="1.0" encoding="UTF-8"?> > so what do you mean by "C encoding" > > at this point we are not concerned with UTF-16 data > are there many sources of UTF-16 data?
Many Chinese, Japanese and Korean web pages either use UTF-16 or (for PRC) use GB18030. Irek _______________________________________________ ast-developers mailing list [email protected] https://mailman.research.att.com/mailman/listinfo/ast-developers
