RE: PaceSourceRecs

Bob Wyman Fri, 06 May 2005 14:03:25 -0700

Antone Roundy wrote:
> Start by calculating the the language of the atom:feed and
> the atom:entry.  Second, if the language of atom:entry isn't
> the same as the aggregate feed, set it. Third, if the
> language of atom:feed isn't the same as the atom:entry,
> set it.
        I'm curious about this "calculation" word... The Pace says that
"Language values should be calculated, according to the rules of
[W3C.REC-xml-20040204], by processing the xml:lang values of the element in
question and its ancestors."
        This raises an interesting point... What should we do if we know
that the language that results from the calculation is not the actual
language of the entry? For instance, language recognition technology is
relatively well known and works with reasonable accuracy these days. The
fact is that the language fields in both RSS and existing Atom files are
wrong in a very large percentage of cases. (i.e. folk writing Chinese blogs
on US-based systems are getting feeds that claim "en-us" as the language and
often have no way to correct the language tag since US developers hardly
ever think about I18N issues...)
        We've had folk suggest to us that we should run Language Recognition
algorithms to calculate the actual language of entries and then "fix" their
tags in cases where the tags are missing or clearly wrong. I've resisted
such suggestions if only because I can't figure out where to write the
"correct" language tags. What would you suggest we do? Or, should we do
nothing? What would you suggest we do when trying to insert an entry into an
aggregate feed if the original entry had no language tag yet we are very
confident that our Language Recognition code has determined the actual
language of the entry and it is not the same as the language of the
enclosing feed?


        bob wyman

RE: PaceSourceRecs

Reply via email to