Hi Even,

It seems to me that this is duplicating of RFC 50: OGR field subtypes.
For example we have the master field type DateTime and Subtype - Year.
So the internal structure for date/time representation may be adopt to such technique.

Best regards,
    Dmitry

06.04.2015 15:02, Even Rouault пишет:
Le lundi 06 avril 2015 13:48:47, Even Rouault a écrit :
Le lundi 06 avril 2015 11:32:33, Dmitriy Baryshnikov a écrit :
The first solution looks reasonable. But there is lack in precision
field - there the only time is significant:

ODTP_HMSm
ODTP_HMS
ODTP_HM
ODTP_H
As I didn't want to multiply the values in the enumeration, my intent was
to reuse the ODTP_YMDxxxx values for OFTTime only.
I meant "for OFTTime too"

This was what I wanted
to intend with the precision between parenthesis in the comment of
ODTP_YMDH "Year, month, day (if OFTDateTime) and hour"

Or perhaps, the enumeration should capture the most precise part of the
(date)time structure  ?
ODTP_Year
ODTP_Month
ODTP_Day
ODTP_Hour
ODTP_Minute
ODTP_Second
ODTP_Millisecond

etc.

Best regards,

      Dmitry

05.04.2015 22:25, Even Rouault пишет:
Hi,

In an effort of revisiting http://trac.osgeo.org/gdal/ticket/2680,
which is about lack of precision of the current datetime structure,
I've imagined different solutions how to modify the OGRField
structure, and failed to pick up one that would be the obvious
solution, so opinions are welcome.

The issue is how to add (at least) microsecond accuracy to the datetime
structure, as a few formats support it explicitely or implicitely :
MapInfo, GPX, Atom (GeoRSS driver), GeoPackage, SQLite, PostgreSQL,
CSV, GeoJSON, ODS, XLSX, KML (potentially GML too)...

Below a few potential solutions :

---------------------------------------
Solution 1) : Millisecond accuracy, second becomes a float

This is the solution I've prototyped.

typedef union {
[...]

      struct {
GInt16 Year;
          GByte   Month;
          GByte   Day;
          GByte   Hour;
          GByte   Minute;
          GByte   TZFlag;
          GByte   Precision; /* value in OGRDateTimePrecision */
          float   Second; /* from 00.000 to 60.999 (millisecond
          accuracy) */
} Date;

} OGRField

So sub-second precision is representing with a single precision
floating point number, storing both integral and decimal parts. (we
could theorically have a hundredth of millisecond accuracy, 10^-5 s,
since 6099999 fits on the 23 bits of the mantissa)

Another addition is the Precision field that indicates which parts of
the datetime structure are significant.

/** Enumeration that defines the precision of a DateTime.

    * @since GDAL 2.0
    */

typedef enum
{

      ODTP_Undefined,     /**< Undefined */
      ODTP_Guess,         /**< Only valid when setting through
      SetField(i,year,

month...) where OGR will guess */

      ODTP_Y,             /**< Year is significant */
      ODTP_YM,            /**< Year and month are significant*/
      ODTP_YMD,           /**< Year, month and day are significant */
      ODTP_YMDH,          /**< Year, month, day (if OFTDateTime) and
      hour are

significant */

      ODTP_YMDHM,         /**< Year, month, day (if OFTDateTime), hour
      and

minute are significant */

      ODTP_YMDHMS,        /**< Year, month, day (if OFTDateTime), hour,
      minute

and integral second are significant */

      ODTP_YMDHMSm,       /**< Year, month, day (if OFTDateTime), hour,
      minute

and second with microseconds are significant */
} OGRDateTimePrecision;

I think this is important since "2015/04/05 17:12:34" and "2015/04/05
17:12:34.000" do not really mean the same thing and it might be good to
be able to preserve the original accuracy when converting between
formats.

A drawback of this solution is that the size of the OGRField structure
increases from 8 bytes to 12 on 32 bit builds (and remain 16 bytes on
64 bit). This is probably not that important since in most cases not
that many OGRField structures are instanciated at one time (typically,
you iterate over features one at a time).
This could be more of a problem for use cases that involve the MEM
driver, as it keep all features in memory.

Another drawback is that the change of the structure might not be
directly noticed by application developers as the Second field name is
preserved, but a new Precision field is added, so there's a risk that
Precision is let uninitialized if the field is set through
OGRFeature::SetField(int iFieldIndex, OGRField* psRawField). That could
lead to unexpected formatting (but hopefully not crashes with defensive
programming). However I'd think it is unlikely that many applications
directly manipulate OGRField directly, instead of using the getters and
setters of OGRFeature.

---------------------------------------
Solution 2) : Millisecond accuracy, second and milliseconds in distinct
fields

typedef union {
[...]

      struct {
GInt16 Year;
          GByte   Month;
          GByte   Day;
          GByte   Hour;
          GByte   Minute;
          GByte   TZFlag;
          GByte   Precision; /* value in OGRDateTimePrecision */
          GByte   Second; /* from 0 to 60 */
        
        GUInt16 Millisecond; /* from 0 to 999 */
        
      } Date;

} OGRField

Same size of structure as in 1)

---------------------------------------
Solution 3) : Millisecond accuracy, pack all fields

Conceptually, this would use bit fields to avoid wasting unused bits :

typedef union {
[...]

    struct {
GInt16 Year;
      GUIntBig     Month:4;
      GUIntBig     Day:5;
      GUIntBig     Hour:5;
      GUIntBig     Minute:6;
      GUIntBig     Second:6;
      GUIntBig     Millisecond:10; /* 0-999 */
      GUIntBig     TZFlag:8;
      GUIntBig     Precision:4;
} Date;

} OGRField;

This was proposed in the above mentionned ticket. And as there were
enough remaining bits, I've also added the Precision field (and in all
other solutions).

The advantage is that sizeof(mydate) remains 8 bytes on 32 bits builds.

But the C standard only defines bitfields of int/unsigned int, so this
is not portable, plus the fact that the way bits are packed is not
defined by the standard, so different compilers could come up with
different packing. A workaround is to do the bit manipulation through
macros :

typedef union {
[...]

    struct {
        
        GUIntBig        opaque;
        
    } Date;

} OGRField;

#define GET_BITS(x,y_bits,shift)        (int)(((x).Date.opaque >>
(shift)) & ((1 << (y_bits))-1))

#define GET_YEAR(x)              (short)GET_BITS(x,16,64-16)
#define GET_MONTH(x)             GET_BITS(x,4,64-16-4)
#define GET_DAY(x)               GET_BITS(x,5,64-16-4-5)
#define GET_HOUR(x)              GET_BITS(x,5,64-16-4-5-5)
#define GET_MINUTE(x)            GET_BITS(x,6,64-16-4-5-5-6)
#define GET_SECOND(x)            GET_BITS(x,6,64-16-4-5-5-6-6)
#define GET_MILLISECOND(x)       GET_BITS(x,10,64-16-4-5-5-6-6-10)
#define GET_TZFLAG(x)            GET_BITS(x,8,64-16-4-5-5-6-6-10-8)
#define GET_PRECISION(x)         GET_BITS(x,4,64-16-4-5-5-6-6-10-8-4)

#define SET_BITS(x,y,y_bits,shift)  (x).Date.opaque = ((x).Date.opaque
& (~( (GUIntBig)((1 << (y_bits))-1) << (shift) )) | ((GUIntBig)(y) <<
(shift)))

#define SET_YEAR(x,val)            SET_BITS(x,val,16,64-16)
#define SET_MONTH(x,val)           SET_BITS(x,val,4,64-16-4)
#define SET_DAY(x,val)             SET_BITS(x,val,5,64-16-4-5)
#define SET_HOUR(x,val)            SET_BITS(x,val,5,64-16-4-5-5)
#define SET_MINUTE(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6)
#define SET_SECOND(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6-6)
#define SET_MILLISECOND(x,val)
SET_BITS(x,val,10,64-16-4-5-5-6-6-10) #define SET_TZFLAG(x,val)
   SET_BITS(x,val,8,64-16-4-5-5-6-6-10-8) #define SET_PRECISION(x,val)
SET_BITS(x,val,4,64-16-4-5-5-6-6-10-8-4)

Main advantage: the size of OGRField remains unchanged (so 8 bytes on
32-bit builds).

Drawback: manipulation of datetime members is less natural, but there
are not that many places in the GDAL code base were the OGRField.Date
members are used, so it is not much that a problem.

---------------------------------------
Solution 4) : Microsecond accuracy with one field

Solution 1) used a float for second and sub-second, but a float has
only 23 bits of mantissa, which is enough to represent second with
millisecond accuracy, but not for microsecond (you need 26 bits for
that). So use a 32-bit integer instead of a 32-bit floating point.

typedef union {
[...]

      struct {
GInt16 Year;
          GByte   Month;
          GByte   Day;
          GByte   Hour;
          GByte   Minute;
          GByte   TZFlag;
          GByte   Precision; /* value in OGRDateTimePrecision */
          GUInt32 Microseconds; /* 00000000 to 59999999 */
} Date;

} OGRField

Same as solution 1: sizeof(OGRField) becomes 12 bytes on 32-bit builds
(and remain 16 bytes on 64-bit builds)

We would need to add an extra value in OGRDateTimePrecision to mean the
microsecond accuracy.

Not really clear we need microseconds accuracy... Most formats that
support subsecond accuracy use ISO 8601 representation (e.g. YYYY-MM-
DDTHH:MM:SS.sssssZ) that doesn't define the maximal number of decimals
beyond second. From
http://www.postgresql.org/docs/9.1/static/datatype-datetime.html,
PostgreSQL supports microsecond accuracy.

---------------------------------------
Solution 5) : Microsecond with 3 fields

A variant where we split second into 3 integer parts:

typedef union {
[...]

      struct {
GInt16 Year;
          GByte   Month;
          GByte   Day;
          GByte   Hour;
          GByte   Minute;
          GByte   TZFlag;
          GByte   Precision; /* value in OGRDateTimePrecision */
        
        GByte   Second; /* 0 to 59 */
        
          GUInt16  Millisecond; /* 0 to 999 */
          GUInt16 Microsecond; /* 0 to 999 */
} Date;

} OGRField

Drawback: due to alignment, sizeof(OGRField) becomes 16 bytes on 32-bit
builds (and remain 16 bytes on 64-bit builds)

---------------------------------------
Solution 6) : Nanosecond accuracy and beyond !

Now that we are using 16 bytes, why not having nanosecond accuracy ?

typedef union {
[...]

      struct {
GInt16 Year;
          GByte   Month;
          GByte   Day;
          GByte   Hour;
          GByte   Minute;
          GByte   TZFlag;
          GByte   Precision; /* value in OGRDateTimePrecision */
        
        double   Second; /* 0.000000000 to 60.999999999 */
        
      } Date;

} OGRField

Actually we even have picosecond accuracy! (since for picoseconds, we
need 46 bits and a double has 52 bits of mantissa). And if we use a
64-bit integer instead of a double, we can have femtosecond accuracy
;-)

Any preference ?

Even
_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to