Hello, I'm working on a firefox extension and I'm not entirely sure
this is the best place to post, but:

I'd like to access text from the webpage in unicode format. However,
when I parse the DOM tree of the browser content, I get another format
(and the format seems to depend on the Character Encoding I chose in
the browser, but it looks like it's never given in unicode).

Now, there are some XPCOM charset conversion utilities that would
theoretically do the job, but those would require knowing the charset
used in the page (which can probably be retrieved, but it sounds like
a pain).

I read that Gecko's internal representaiton is in unicode, accessing
that would be neat. Does Gecko then create the "content" DOM tree in a
more specific encoding for cleaner display ? Would there be a way to
access the unicode representation Gecko creates ?

(Please forgive my approximate terminology, I'm still very much a
newbie to all this; feel free to correct me)

Thank you,

Emile

_______________________________________________
mozilla-layout mailing list
[email protected]
http://mail.mozilla.org/listinfo/mozilla-layout

Reply via email to