Let me offer another humble suggestion, though one that should not be a
candidate for FB3: Ditch the concept of fixed length records,
completely and forever.
Let me sketch the scheme I used in Netfrastructure / Falcon, which
originated from a suggestion on this list. The key elements are:
1. A record encoding that uses a single byte to encode type, type and
value, type and length, or type and length of byte count. If anyone
is interested, I could post the Falcon code for reference. There
would be separate byte codes for, say, the integers from -10 to 30
(arbitrary), and integers with one to eight value bytes. Strings,
in turn, might have codes for strings from zero to 40 bytes and
codes for strings with 1 to 4 count bytes. Blobs are represented
with blob id types.
2. The analog of the formats table represents only the mapping from
logical field id to physical field sequence for a given record
version number.
3. Parsing a record is cheap -- a single byte table that gives data
length or the number of count bytes (which need to be decoded of
course).
4. An active record in member has field offset record and high water
mark to avoid re-parsing fields. The Netfrastructure implementation
made provision for 8 bit, 16 kit, and 32 bit offsets depending on
record length, so very long records could be accommodated.
5. Value decode is a simple switch statement on type.
Using a large customer database, encoded records where on average 60%
the size of Firebird run length encoding and about 30% of the Firebird
record in memory.
In both Falcon and NuoDB (I stuck with the Falcon code) the code was
regularly profiled. Record encode and decode did show up, but in the
range of 2 or 3% of cycles, all inclusive. But since records were much
smaller and didn't require compression and decompression, the CPU cost
was probably a wash and maybe even is small win.
There are two other huge fringe benefits. One is that numbers are
numbers -- to max size need be given. The other is that there is no
declared max length of strings other than a 32 bit count (which could be
trivially extended). There are SQL standard, language, and API issues
and enforcement issues, of course. But these are easily worked
through. Netfrastructure handled all official SQL types plus "number"
(no magnitude) and "string" (no length). MySQL was so wedded to fixed
length records that they couldn't even bring themselves to think about
alternatives.
The Rdb/ELN / Galaxy / Interbase / Firebird family has almost supported
various length records in the ODS. Maybe it's almost time to take full
advantage of this.
On 8/31/2013 7:46 AM, Ann Harrison wrote:
On Aug 31, 2013, at 4:55 AM, Mark Rotteveel <m...@lawinegevaar.nl> wrote:
On 29-8-2013 17:41, Jim Starkey wrote:
Paradoxically, Japanese strings tend to be shorter in UTF-8 than 16 bit
Unicode. The reason is simple: There are enough single byte characters
-- punctuation, control characters, and digits -- stay as single bytes,
double byte characters are a wash, and the single byte characters
generally balance the number of three byte characters.
UTF-16 is a mess with nasty problems of endians, multi-word characters,
and illegal codepoints to worry about.
Unfortunately the implementation of UTF-8 in Firebird is annoying
because it reduces that maximum allowed number of characters to a 1/4 of
that for single byte character sets making it necessary to switch to
blobs sooner.
A better solution is to change the implementation of CHAR and VARCHAR to accept
longer strings.
Cheers,
Ann
l
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel