Re: Patch: Multi-encoding Text import/export

Vlad Harchev Sat, 19 May 2001 09:58:52 -0700
On Sat, 19 May 2001, Sam TH wrote:

 Hi, 

> On Sat, May 19, 2001 at 06:19:21PM +1000, Andrew Dunbar wrote:
> > I consider this a pretty important change.
> > 
> > It allows you to import a text file no matter if
> > it's an old 8-bit encoding, UTF-8, or UCS-2 as is
> > used in Windows and Mac OSX.
> > 
> > It also allows you to export to any of these text
> > formats - though changes are needed to the rest of
> > AbiWord to fully support this.
> > 
> > This also means we will no longer need separate
> > UTF-8 and UCS-2 importers and exporters and any
> > .txt file will "just work" - perfect for church
> > secretaries (:
> > 
> > Please somebody have a serious look at this!
> > Feedback much appreciated.
> 
> This looks really good.  A couple quick comments:
> 
> - _recognizeUCS/UTF8 should definitely be members of class.
>   IE_Imp_Text_Sniffer is probably the best choice.
> 
> - All the new functions need doxygen comments.  
> 
> Those two you should fix before someone commits this.  They shouldn't
> be too hard.
> 
> The third thing is that UTF8 can be various-endian as well, so you
> probably want to detect that.  

 You are plain wrong here. UTF8 is a sequence of bytes (and the ability
to recognize offset from start of sequence is the key feature of utf8 - utf8 
can't be endian).
 
> Question: does our current UTF8 export use a byte-order mark?  If not,
> it probably should.  

 No, byte order mark is meaningful only to UCS2 and UCS4.

 Best regards,
  -Vlad
Re: Patch: Multi-encoding Text import/export

Reply via email to