Re: [Harbour] BYTE -> UCHAR patch

Przemyslaw Czerpak Sun, 08 Feb 2009 07:44:19 -0800

On Sat, 07 Feb 2009, Szak�ts Viktor wrote:

Hi Viktor,


> > I suggest to change all 'BYTE *' used as file name in Harbour
> > FS API and similar functions to 'char *' type to not replicate
> > the problems which comes from Clipper, f.e. from CL53 header files:
> >   typedef unsigned char BYTE;
> >   typedef BYTE far * BYTEP;
> >   extern void _retc(BYTEP);
> >   extern void _retclen(char far *, unsigned int);
> > You may find similar problems also in CL52.
> > It forces explicit casting in C++ mode if char is signed. Such casting
> > has bad side effect. It can hide typos or even errors normally easy to
> > catch at compile time, f.e. pFile wrongly used instead of szFile:
> >   PHB_ITEM pFile = hb_param( 1, HB_IT_STRING );
> >   char * szFile = hb_itemGetCPtr( pFile )
> >   HB_FHANDLE hFile = hb_fsOpen( ( BYTE * ) pFile, ... );
> Yes, I see your point.
> I'll try to fix that before committing anything, but first
> I have some questions.
> Am I right assuming that we will definitely take the UTF-8
> route then (for filenames, too), and in this case 'char' may
> just hold UTF-8 strings in the future? (had we choose UTF16,
> we may need to centrally redefine 'char' to double bytes in
> the future, so such abstract type might have its benefits.)

I do not think we can decide now. The UTF-8 allows to keep all
existing functions with 'char' as is but in some cases U16 (it's
not exactly UTF16) is better because it's fixed length format.
Anyhow I do not think we can fully drop pure char API. It still
can be usable for different libraries so maybe we will have both.
I would like to return to this subject when we will have CDP API
and will work on Unicode support.

> > In your modifications you replaced BYTE * used as file name in Harbour
> > FS API to UCHAR *. File names are text strings and  I want to use
> > simple char * for them and I want to change BYTE used as synonym
> > of 8bit unsigned integer to UCHAR.
> Is this true for GT strings, too? (to convert them to char, not UCHAR)

It will depend on functions.
Here we will have interaction with Unicode, too.
Now characters and strings are passed as BYTEs.
Let's leave it now. Here we should make also few other modifications
which will break backward binary compatibility so it should be grouped
with other modifications.

> Q1: Having Unicode support in mind, shouldn't we use some distinct
> marking for 'text' (char *) data?

Using 'char *' already make it.

> Q2: May I change to UCHAR / SCHAR to HB_UCHAR / HB_SCHAR,
> as part of moving our special types to our own namespace?
> (UCHAR is an Microsoft type name, also)

Please not now. We have many different types and we haven't decide
how the final version should look and what names space we should
use. Now I suggest to make only basic modifications which does not
break backward binary compatibility and freeze modifications to
releaze new stable version with MT support. When this version
will be ready then we can stat discussion on core API modifications
and begin to implement it. Such modifications make take long time,
even more then year and people will need stable version which MT
so first we have to release it. Probably it should be version 1.2.
After releasing 1.2 we can start new branch with 1.3 and begin to
introduce agreed modifications. If the backward source code and/or
binary compatibility will be seriously broken by new API (I expect it
in this case) then probably next stable release should have number
2.0 to make difference between branches and 1.x and 2.x API.


> Q3: Can we define the rules for different string/char types, so that
> everyone speaks the same language here:
> - HB_UCHAR, HB_SCHAR: ... ?
> - char / HB_TEXT?: Harbour character/string (with future UTF-8 support) ?
> - BYTE: ... Harbour raw binary data or other simple numeric BYTE ?

The sign of BYTE is platform/C compiler dependent so should not be used
internally as number holder because programmers does not know it will be
mapped to signed or unsigned char. This type should be rather used to
mark data where sign is undefined.
I do not know yet what name space we will use and how we will represent
Unicode strings so I do not think we should make any of such modifications
now. Please remember that each time you are changing some definitions in
core code you force code updating in 3-rd party projects. Making such
modifications with important reasons and then reverting them or changing
to sth else has to be serious problem for 3-rd party code developers.
We have to well know what is our final goal before we start anything like
that.

> Maybe that would help for everyone to see more clearly.

If we change it once again in few months then for sure not.

best regards,
Przemek

ps. For new types I suggest to use sth like:

   ANSI C types:
      void,
      [ [un]signed ] char, [ [un]signed ] short, [ [un]signed ] int,
      [ [un]signed ] long, double,

      [ [un]signed ] long long is not supported by some platforms / C compilers
      so it should not be used

   harbour overloaded types:
      hbChar, hbSChar, hbUChar, hbShort, hbUShort, hbInt, hbUInt,
      hbLong, hbULong, hbLongLong, hbULongLong, hbDouble,
      hbMaxInt, hbMaxDouble,
      hbCounter, hbSize, hbPtrDiff, hbPtrVal,
      hbPointer,
      hbWChar // for future wide character representation

   harbour strict bit types:
      hbI8, hbU8, hbI16, hbU16, hbI32, hbU32, hbI64, hbI64

   Types which depends on internal HVM/compilation settings:
      hbMaxVMInt - maximal integer which can be storred in HVM item
                   (HB_IT_LONG). It's current HB_LONG, usually will be
                   the same as hbMaxInt unless for some reasons it will
                   not be reduced, f.e. compiler may support 128bit
                   integers as hbMaxInt but we may don't use it for
                   HB_IT_LONG due to performance reduction.

It's not the full list. Just a startup proposition. After 1.2 we can discuss
about the full list of types and new API. I will also want to update some
public structures (f.e. RDD ones). All this modifications can make Harbour
unstable for some time and when they will be finally ready then people
using 3-rd party libraries will have to wait for their updating and new
releases. Without 1.2 they will have very serious problem.

best regards,
Przemek
_______________________________________________
Harbour mailing list
[email protected]
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] BYTE -> UCHAR patch

Reply via email to