date:20191111

> On Nov 11, 2019, at 11:09 AM, Jose Isaias Cabrera  wrote:
> 
> Compared to me, you are a genius in everything, but you just lack a little 
> bit of understanding about other languages and their localization behavior.  
> As a Technical Project Manager for 12 years on my last job, all of these 
> statements that I am making, and yours above, were part of my daily routines. 

Well, I did spend years at Apple working on several GUI apps (iChat, Address 
Book, etc.) that were, of course fully localized. So yes, I understand what 
you're saying here. But doesn't it prove my point? You'd never just arbitrarily 
decide to display 20 characters in some view in a real app. Instead you'd check 
the bounds of the view at runtime and render text to fit in that. Or going the 
other way, you'd compute the width of the text in the specified font, then 
resize the view to fit it.

The details of the correspondences between bytes, code points,  characters, and 
glyphs are really complex. Using UTF-32 only helps with the first of those 
mappings; you still have to pay attention to the rest. A 4x increase in space 
usage just isn't a good trade-off for such a limited benefit.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Keith Medcalf

On Monday, 11 November, 2019 14:34, Richard Damon  
wrote:

>Unicode has decreed that the highest code-point that can be called a
>code-point is 0x10 because to go higher breaks UTF-16, so there
>isn't as much room as you might think.

>This give us 1,114,112 possible code points.

>There are currently 137,994 code points assigned to characters, 66
>assigned as non-characters, 2048 reserved for the surrogates, and a
>number reserved for private use, leaving 836,536 currently unassigned.
>This says we have some space to grow, but there are still a lot of
>archaic and unusual scripts that are being proposed or worked on.

Ah yes.  Like the "Poo Emoji Anti Distrimination Society".  They are complaing 
that the current poo emoji is racist, specist, and does not adequately reflect 
the actual distribution of colours of poo.  They want unicode charcters 
allocated for "runny poo", "poo pebles", "liquid poo", and "poo-corns".  This 
is in addition to the current "pile-of-poo".  They also want all these variants 
to have non-discriminatory colouration:  "green poo", "brown poo", "grey poo", 
"black poo" and "white poo".  They also want all the composed variants such as 
"liquid white poo with black streaks", "green poo pebles with red streaks".  
They also want to have a "steaming", "smelly", "perfume" modifiers as well as 
"wet", "dry", and "fossilized" versions.  This will take up about 600 unicode 
code points.

So yes, the unicode code point space is being rapidly used up.  Soon there will 
be a new standard called "wonkycode" which will use a 64-bit encoding.  It is 
anticipated that it will last about 2 years before the entire code space is 
used up, and which time "googlecode" will take over which uses a 1024 bit 
encoding space.  Hopefully this will last until the turn of the next century by 
which time quantum computing will be available and all possible code points can 
be encoded in one qubit which can have all values simultaneously.  Until you 
look, of course, at which point the status will become fixed.  This will be 
called shroedingers code.

:)

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume. 

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11/11/19 2:57 PM, Jose Isaias Cabrera wrote:
> Igor Tandetnik, on Monday, November 11, 2019 02:24 PM, wrote...
>> On 11/11/2019 12:50 PM, Richard Damon wrote:
>>> Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
>>> screen.
>> Or more, depending on what you mean by  "glyph". See e.g. U+FDFB (ARABIC
>> LIGATURE JALLAJALALOUHOU,
>> https://www.fileformat.info/info/unicode/char/fdfb/index.htm ) or U+FB03
>> (LATIN SMALL LIGATURE FFI,
>> https://www.fileformat.info/info/unicode/char/fb03/index.htm)
> Thanks for this, Igor.  Again, UTF32 has lots of space, still.  If you look 
> at the representation of these two characters,
>
> ARABIC LETTER JALLAJALALOUHOU UTF-32 (hex) 0xFDFB (fdfb)
> LATIN SMALL LIGATURE FFI UTF-32 (hex) 0xFB03 (fb03)
>
> Look at their hex representations in UTF32:
> 1. 0xFDFB
> 2. 0xFB03
>
> The first 4 0's are still unused spaces.  Japanese, Chinese, etc., glyphs 
> have an unique UTF32 code, so, it will always work.
>
> josé

Unicode has decreed that the highest code-point that can be called a
code-point is 0x10 because to go higher breaks UTF-16, so there
isn't as much room as you might think.

This give us 1,114,112 possible code points.

There are currently 137,994 code points assigned to characters, 66
assigned as non-characters, 2048 reserved for the surrogates, and a
number reserved for private use, leaving 836,536 currently unassigned.
This says we have some space to grow, but there are still a lot of
archaic and unusual scripts that are being proposed or worked on.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Tom Browder

On Mon, Nov 11, 2019 at 15:28 Tom Browder  wrote:

> See the entry point for the language at .
>

Oh, and there are Debian packages available, too.

-Tom
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Tom Browder

On Mon, Nov 11, 2019 at 15:18 Richard Damon 
wrote:

> On 11/11/19 3:49 PM, Jose Isaias Cabrera wrote:
> > Richard Damon, on Monday, November 11, 2019 02:37 PM, wrote...
> > Aaaah, my apologies.  We are talking about different things. You are
> talking about a combination of Unicodes vs. full, character. I take it
> back.

You folks may be interested in checking out the relatively new programming
language "Raku" (currently in the throes of renaming from "Perl 6").  Its
natural text handling is UTF-8 and there are modules for handling SQLite
and other RDBMS systems.

See the entry point for the language at .

Best regards,

-Tom
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11/11/19 3:49 PM, Jose Isaias Cabrera wrote:
> Richard Damon, on Monday, November 11, 2019 02:37 PM, wrote...
>
>> No.
> Aaaah, my apologies.  We are talking about different things. You are talking 
> about a combination of Unicodes vs. full, character. I take it back.  Yes, if 
> you are combining these, then, of course, you are going to have to a 
> different word count because there are actually characters being involved.  
> are talking pieces vs. full words.  If there is a combination, is just like 
> the accented e, é, why not use the one character vs the combination?
>
> josé

Because not all accented characters have a single code-point. In my
example there is, because Greek was worked on earlier. At some point in
the work on Unicode, they realized that there really were too many
combinations that happen in real life to try to assign code-points to
all of them. This also happened in the CJK characters there are a very
large number of them, far more than they want to give code-points to, so
a large number of archaic forms, that are currently mostly only used in
names, are built with composing characters. (Back to problems with names).

The article at http://unicode.org/faq/char_combmark.html gives some
examples, one is:

The Devanagari syllable "ni" must be composed using a base character
"na" (न) followed by a combining vowel for the "i" sound ( ि), although
end users see and think of the combination of the two "नि" as a single
unit of text.

So the question comes, when do you REALLY need to know how many
code-points are in a string, or get a specific number of them? Having a
given number of code-units (or bytes) can be useful for building indexes
where a fixed size makes addressing easier for searching. Counting by
Glyphs is sometimes useful at presentation layer (but needs to be
combined with character widths).

An Input Method would need to deal with the characters as code-points
(likely decomposed), but also probably needs to know about the Glyph to
show the cursor (unless that can be handled by the output method that it
uses).

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Warren Young

On Nov 11, 2019, at 1:49 PM, Jose Isaias Cabrera  wrote:
> 
> If there is a combination, is just like the accented e, é, why not use the 
> one character vs the combination?

Big “if.”  There isn’t always a pre-composed character.

Typically, pre-composed characters exist in Unicode for compatibility with 
legacy encodings so that you can have lossless mappings from e.g. ISO 8859-1 to 
Unicode and back.  In an ideal world, Unicode would have no pre-composed 
characters, only base characters and accents.

That is, in fact, the way the macOS native file systems HFS+ and APFS handle 
Unicode in file names.  It’s called Normalization Form D: input is decomposed 
and stored that way, always.  It’s done to ensure that sorting happens 
predictably.

See:

https://www.unicode.org/standard/where/#Duplicates
https://www.unicode.org/reports/tr15/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


Igor Tandetnik, on Monday, November 11, 2019 02:56 PM, wrote...
>
> On 11/11/2019 12:30 PM, Jose Isaias Cabrera wrote:
> >
> > Igor Tandetnik, on Monday, November 11, 2019 11:02 AM, wrote...
> >>> Most people have to figure out what Unicode they are using, count the 
> >>> bytes, divide
> >>> by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, 
> >>> convert it to
> >>> UTF32, and do a count.
> >>
> >> And then what do you do with that count? What do you use it for?
> >
> > Say that I am writing a report and I only want to print the first 20 
> > characters of a string
> A sequence of Unicode codepoints U+006F U+0302 U+0301 should be rendered as a 
> single grapheme
> ( ố  ) - what a human would think of as a "character". This is an actual 
> character in
> Vietnamese. Now, if you have several such triplets in a row in your string, 
> and you chop it at
> 20 codepoints, you'll only print 7 graphemes / "characters". Moreover, you'll 
> end up dropping
> the last combining accent, producing a different grapheme (ô) and 
> potentially altering the
> meaning of the text. (Don't know how much of a danger this is in Vietnamese, 
> but I know that
> combining viramas https://www.compart.com/en/unicode/combining/9 are vital to 
> Indic languages,
> and dropping one will in fact often produce a valid but different word).

Yes, dropping pieces of words is a problem in any language.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


Richard Damon, on Monday, November 11, 2019 02:37 PM, wrote...

>
> No.

Aaaah, my apologies.  We are talking about different things. You are talking 
about a combination of Unicodes vs. full, character. I take it back.  Yes, if 
you are combining these, then, of course, you are going to have to a different 
word count because there are actually characters being involved.  are talking 
pieces vs. full words.  If there is a combination, is just like the accented e, 
é, why not use the one character vs the combination?

josé
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11/11/2019 2:56 PM, Igor Tandetnik wrote:

On 11/11/2019 12:30 PM, Jose Isaias Cabrera wrote:

Igor Tandetnik, on Monday, November 11, 2019 11:02 AM, wrote...

Most people have to figure out what Unicode they are using, count the bytes,
divide
by... and on, and on. Not me, I just take that UTF8, or UTF16 string, convert
it to
UTF32, and do a count.

And then what do you do with that count? What do you use it for?

Say that I am writing a report and I only want to print the first 20 characters
of a string

A sequence of Unicode codepoints U+006F U+0302 U+0301 should be rendered as a single grapheme ( ố
) - what a human would think of as a "character". This is an actual character in
Vietnamese. Now, if you have several such triplets in a row in your string, and you chop it at 20
codepoints, you'll only print 7 graphemes / "characters". Moreover, you'll end up
dropping the last combining accent, producing a different grapheme (ô) and potentially altering
the meaning of the text. (Don't know how much of a danger this is in Vietnamese, but I know that
combining viramas https://www.compart.com/en/unicode/combining/9 are vital to Indic languages, and
dropping one will in fact often produce a valid but different word).

A more colorful example: Emoji characters are composed of a long sequence of
Unicode codepoints: ‍‍‍ would be U+1F468 U+200D U+1F469 U+200D U+1F476
U+200D U+1F476 ( https://emojipedia.org/family-man-woman-baby-baby/ ) .
Truncating such a sequence at an arbitrary point is likely to produce a valid
emoji with a very different meaning.

Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


Igor Tandetnik, on Monday, November 11, 2019 02:24 PM, wrote...
>
> On 11/11/2019 12:50 PM, Richard Damon wrote:
> > Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
> > screen.
>
> Or more, depending on what you mean by  "glyph". See e.g. U+FDFB (ARABIC
> LIGATURE JALLAJALALOUHOU,
> https://www.fileformat.info/info/unicode/char/fdfb/index.htm ) or U+FB03
> (LATIN SMALL LIGATURE FFI,
> https://www.fileformat.info/info/unicode/char/fb03/index.htm)

Thanks for this, Igor.  Again, UTF32 has lots of space, still.  If you look at 
the representation of these two characters,

ARABIC LETTER JALLAJALALOUHOU UTF-32 (hex) 0xFDFB (fdfb)
LATIN SMALL LIGATURE FFI UTF-32 (hex) 0xFB03 (fb03)

Look at their hex representations in UTF32:
1. 0xFDFB
2. 0xFB03

The first 4 0's are still unused spaces.  Japanese, Chinese, etc., glyphs have 
an unique UTF32 code, so, it will always work.

josé
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


On 11/11/2019 12:30 PM, Jose Isaias Cabrera wrote:


Igor Tandetnik, on Monday, November 11, 2019 11:02 AM, wrote...

Most people have to figure out what Unicode they are using, count the bytes, 
divide
by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, convert 
it to
UTF32, and do a count.


And then what do you do with that count? What do you use it for?


Say that I am writing a report and I only want to print the first 20 characters 
of a string

A sequence of Unicode codepoints U+006F U+0302 U+0301 should be rendered as a single grapheme ( ố 
 ) - what a human would think of as a "character". This is an actual character in 
Vietnamese. Now, if you have several such triplets in a row in your string, and you chop it at 20 
codepoints, you'll only print 7 graphemes / "characters". Moreover, you'll end up 
dropping the last combining accent, producing a different grapheme (ô) and potentially altering 
the meaning of the text. (Don't know how much of a danger this is in Vietnamese, but I know that 
combining viramas https://www.compart.com/en/unicode/combining/9 are vital to Indic languages, and 
dropping one will in fact often produce a valid but different word).
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11/11/19 2:16 PM, Jose Isaias Cabrera wrote:
> Richard Damon, on Monday, November 11, 2019 12:50 PM, wrote...
>
>> Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
>> screen.
> This is not true, if the string has more or at least 20 UTF32 characters, and 
> you request 20 character while still talking UTF32, it will print 20.  Once 
> you move to UTF16 or UTF8, then, yes, you are correct.
You will get twenty code points but not twenty glyphs. UTF-32 has the
property that one code-unit is one code-point (which UTF-8 and UTF-16
don't have), but not one code-point = 1 glyph.
>> One quick way to see this is that there is a need for NFD and NFC
>> representations, because some characters can be decomposed from a
>> combined character into a base character + a combining character, so a
>> string in NFD form may naturally 'compress' itself when being printed.
> This is the reason why you want to use UTF32.  UTF8, and UTF16 has to use 
> combination of their character set to cover Eastern languages.  While all 
> languages fit perfectly in UTF32 and they all have their own unique home.
>
> josé

No.

A simple example: Ἀβιά vs Ἀβιά

Both are 4 glyphs or what we would call characters, the first is 6
code-points (U+391, U+313, U+3B2, U+3B9, U+3B1, U+301), the second is 4
code-points (U+1F08, U+3B2, U+3B9, U+3AC)

In this case the decomposed characters happen to match a composed
characters, but that is not always true, some less common composed glyph
do not have a unique single code point assigned to them).

This shows that 1 code point does not equal 1 character, for the usual
user definition of a character.

There are a NUMBER of points in Unicode where to express a single glyph
to the use, it takes multiple code-points to express it. Very shortly
after they realized they needed to extend Unicode beyond the initial 16
bit character set they first thought it could be, the also realized that
they could never reach the goal of assigning a unique code point to the
basic glyphs of every language, so settled on letting some (many) glyphs
be expressed as a combination of glyphs, with somewhat simple (but not
trivial) rules on how to do this.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


Jens Alfke, on Monday, November 11, 2019 01:00 PM, wrote...
>
> > On Nov 11, 2019, at 9:39 AM, Jose Isaias Cabrera, on
> >
> > However, space is cheap now
>
> It isn't. A sizable fraction of all software development is done for devices 
> with
> under a megabyte of RAM. (IoT and embedded are huge markets.)  And remember, 
> we're
> talking on the email forum for a library that's heavily used on that scale of
> hardware.
>
> And even on bigger systems, L1 and L2 caches are small: Intel's i7 Haswell 
> CPUs
> have 32KB and 512KB per core, respectively. Remember, "overflowing cache" is 
> the
> new "VM thrash" — RAM is absurdly slow compared to CPU speeds.

You're right.  I was thinking more of a hard drive space.  I remember when I 
bought a 120M hard drive for $700 to put on my amiga 1000 back in 1984.  How I 
was all excited when I loaded the AmigaOS on my 512KB memory/120M hard drive 
and how fast it loaded vs the floppy disks.  Now I can get a 1T for $89 for a 
laptop.

josé

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


On 11/11/2019 12:50 PM, Richard Damon wrote:

Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
screen.


Or more, depending on what you mean by  "glyph". See e.g. U+FDFB (ARABIC 
LIGATURE JALLAJALALOUHOU, https://www.fileformat.info/info/unicode/char/fdfb/index.htm ) 
or U+FB03 (LATIN SMALL LIGATURE FFI, 
https://www.fileformat.info/info/unicode/char/fb03/index.htm)
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [draft patch] interface for retrieving values of bound parameters

2019-11-11 Thread tab

Just realized that the attachment was not actually included with the original 
message. Here's a gist link instead:
https://gist.github.com/0x09/445fca08ffb4811eae3ca61f965c7a22 


> On Nov 11, 2019, at 2:08 PM, tab  wrote:
> 
> re: expanded_sql this is a bit different -- while sqlite3_expanded_sql 
> provides the values interpolated into the statement as text, this patch adds 
> a function for retrieving them individually from the statement, similar to 
> the column access functions. e.g:
> 
> /* bind some temporary sqlite3_value* at index 1 */
> sqlite3_bind_value(stmt,1,someval);
> 
> /* retrieve it later */
> sqlite3_value *val = sqlite3_param_value(stmt,1);
> 
> Since the statement necessarily holds onto its bound params it'd be a nice 
> addition to be able to refer to it here if the application/module needs, vs 
> maintaining memory/lifetime for those independent of the statement. But it is 
> more or less just a convenience.
> 
>> On Nov 11, 2019, at 1:24 PM, test user  wrote:
>> 
>> Wouldn’t your program already know what the values are as it passed them
>> over the FFI initially? Why not hold onto that state?
>> 
>> On Mon, 11 Nov 2019 at 17:57, x  wrote:
>> 
>>> Is http://www.sqlite.org/c3ref/expanded_sql.html no use to you?
>>> 
>>> 
>>> 
>>> 
>>> From: sqlite-users  on
>>> behalf of tab 
>>> Sent: Monday, November 11, 2019 5:26:42 PM
>>> To: sqlite-users@mailinglists.sqlite.org <
>>> sqlite-users@mailinglists.sqlite.org>
>>> Subject: [sqlite] [draft patch] interface for retrieving values of bound
>>> parameters
>>> 
>>> Hi all,
>>> 
>>> It'd be handy to be able to retrieve params previously bound to a
>>> statement in the C API. Per the advice on the SQLite copyright info page,
>>> this is much more of a suggestion than a full patch, though it is
>>> functional for binding and retrieving an sqlite_value* (but, for example,
>>> there might be further implications not considered here in allowing the
>>> contents of aVar to be used directly.) There wouldn't be much value in
>>> maintaining a fork for something like this, so I wanted to put that out
>>> here on the mailing list to see if it's something that might be considered
>>> for mainline.
>>> 
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> 
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


Richard Damon, on Monday, November 11, 2019 12:50 PM, wrote...

> Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
> screen.

This is not true, if the string has more or at least 20 UTF32 characters, and 
you request 20 character while still talking UTF32, it will print 20.  Once you 
move to UTF16 or UTF8, then, yes, you are correct.

> One quick way to see this is that there is a need for NFD and NFC
> representations, because some characters can be decomposed from a
> combined character into a base character + a combining character, so a
> string in NFD form may naturally 'compress' itself when being printed.

This is the reason why you want to use UTF32.  UTF8, and UTF16 has to use 
combination of their character set to cover Eastern languages.  While all 
languages fit perfectly in UTF32 and they all have their own unique home.

josé
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


Jens Alfke, on Monday, November 11, 2019 12:47 PM, wrote...
>
> Hang on — why exactly 20 characters? Of text in an arbitrary language, which 
> is to be displayed in an arbitrary font, with an arbitrary line width?
>
> I don't know about you, but the only time I think about "exactly 20 
> characters" is when I'm writing to a terminal window. Most of the time that's 
> ASCII, and even if it isn't, Asian characters are going to be double-width, 
> and emojis might render an arbitrary number of code points to a single-width 
> graphic. So "20 characters" still doesn't map to a fixed width onscreen.
>
> If I'm writing a report, I more likely want to render only the first _line_ 
> of a string, i.e. enough text to fill some number of 
> points/millimeters/pixels of space, possibly minus an ellipsis. That is a job 
> for a text rendering library, not `writeflen`.

Compared to me, you are a genius in everything, but you just lack a little bit 
of understanding about other languages and their localization behavior.  As a 
Technical Project Manager for 12 years on my last job, all of these statements 
that I am making, and yours above, were part of my daily routines. Translating 
website, reports, documentation, a Printer's GUI,..., etc. from 2 to 23 
languages involves a lot of these little nuances you are bringing above.  
Unless you are deep in these projects people have no idea what it takes to 
translate a document, keep a translation memory, right to left behavior, 
expanding and contracting, and a bunch of other nick-picky customers 
translation demands. If I had time, I would keep this going, :-).  Thanks.

josé
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [draft patch] interface for retrieving values of bound parameters

2019-11-11 Thread tab

re: expanded_sql this is a bit different -- while sqlite3_expanded_sql provides 
the values interpolated into the statement as text, this patch adds a function 
for retrieving them individually from the statement, similar to the column 
access functions. e.g:

/* bind some temporary sqlite3_value* at index 1 */
sqlite3_bind_value(stmt,1,someval);

/* retrieve it later */
sqlite3_value *val = sqlite3_param_value(stmt,1);

Since the statement necessarily holds onto its bound params it'd be a nice 
addition to be able to refer to it here if the application/module needs, vs 
maintaining memory/lifetime for those independent of the statement. But it is 
more or less just a convenience.

> On Nov 11, 2019, at 1:24 PM, test user  wrote:
> 
> Wouldn’t your program already know what the values are as it passed them
> over the FFI initially? Why not hold onto that state?
> 
> On Mon, 11 Nov 2019 at 17:57, x  wrote:
> 
>> Is http://www.sqlite.org/c3ref/expanded_sql.html no use to you?
>> 
>> 
>> 
>> 
>> From: sqlite-users  on
>> behalf of tab 
>> Sent: Monday, November 11, 2019 5:26:42 PM
>> To: sqlite-users@mailinglists.sqlite.org <
>> sqlite-users@mailinglists.sqlite.org>
>> Subject: [sqlite] [draft patch] interface for retrieving values of bound
>> parameters
>> 
>> Hi all,
>> 
>> It'd be handy to be able to retrieve params previously bound to a
>> statement in the C API. Per the advice on the SQLite copyright info page,
>> this is much more of a suggestion than a full patch, though it is
>> functional for binding and retrieving an sqlite_value* (but, for example,
>> there might be further implications not considered here in allowing the
>> contents of aVar to be used directly.) There wouldn't be much value in
>> maintaining a fork for something like this, so I wanted to put that out
>> here on the mailing list to see if it's something that might be considered
>> for mainline.
>> 
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [draft patch] interface for retrieving values of bound parameters

2019-11-11 Thread test user

Wouldn’t your program already know what the values are as it passed them
over the FFI initially? Why not hold onto that state?

On Mon, 11 Nov 2019 at 17:57, x  wrote:

> Is http://www.sqlite.org/c3ref/expanded_sql.html no use to you?
>
>
>
> 
> From: sqlite-users  on
> behalf of tab 
> Sent: Monday, November 11, 2019 5:26:42 PM
> To: sqlite-users@mailinglists.sqlite.org <
> sqlite-users@mailinglists.sqlite.org>
> Subject: [sqlite] [draft patch] interface for retrieving values of bound
> parameters
>
> Hi all,
>
> It'd be handy to be able to retrieve params previously bound to a
> statement in the C API. Per the advice on the SQLite copyright info page,
> this is much more of a suggestion than a full patch, though it is
> functional for binding and retrieving an sqlite_value* (but, for example,
> there might be further implications not considered here in allowing the
> contents of aVar to be used directly.) There wouldn't be much value in
> maintaining a fork for something like this, so I wanted to put that out
> here on the mailing list to see if it's something that might be considered
> for mainline.
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

> On Nov 11, 2019, at 9:39 AM, Jose Isaias Cabrera  wrote:
> 
> However, space is cheap now

It isn't. A sizable fraction of all software development is done for devices 
with under a megabyte of RAM. (IoT and embedded are huge markets.)  And 
remember, we're talking on the email forum for a library that's heavily used on 
that scale of hardware.

And even on bigger systems, L1 and L2 caches are small: Intel's i7 Haswell CPUs 
have 32KB and 512KB per core, respectively. Remember, "overflowing cache" is 
the new "VM thrash" — RAM is absurdly slow compared to CPU speeds.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11/11/19 12:39 PM, Jose Isaias Cabrera wrote:
> Richard Damon, on Monday, November 11, 2019 11:19 AM, wrote...
>
>> UTF-32 is a reasonable internal operation format, if code-point
>> operations are important. It does not make a good transmission format,
> I agree.  That is why, I have not created any files for anything as UTF32 for 
> delivery or anything personal. ;-)  It's bulky.  However, space is cheap now, 
> and **I think** UTF32 will be a good uniform character set to use.  But, yes, 
> we are far away from that idea.
>
>> as it is usually takes more media than UTF-8 or UTF-16, and for
>> transmission, the message size is important. The big issue is that
>> code-point counting is rarely what you want, you generally want Glyph
>> counting, which even UTF-32 doesn't provide.
> Yes, agreed.  But, I was just pointing out that a name could be displayed as 
> a symbol.
>
>
>> But this shows that 'Unicode' doesn't handle the name, as is, which was
>> the point of the rule, if you design you software just assuming that
>> Unicode can handle all names, you will be very occasionally be wrong.
> I was talking about the artist previously known as Prince.  I was trying to 
> say that it would be feasible to insert that image/symbol as UTF32. It was 
> more of a joke than pretending to have found the answer. :-)
>
> josé
>
But you can't as that symbol doesn't have a Unicode Code point. To do
this you need to go BEYOND Unicode to define private use characters with
Glyphs.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [draft patch] interface for retrieving values of bound parameters

2019-11-11 Thread x

Is http://www.sqlite.org/c3ref/expanded_sql.html no use to you?

From: sqlite-users  on behalf of 
tab 
Sent: Monday, November 11, 2019 5:26:42 PM
To: sqlite-users@mailinglists.sqlite.org 
Subject: [sqlite] [draft patch] interface for retrieving values of bound 
parameters

Hi all,

It'd be handy to be able to retrieve params previously bound to a statement in 
the C API. Per the advice on the SQLite copyright info page, this is much more 
of a suggestion than a full patch, though it is functional for binding and 
retrieving an sqlite_value* (but, for example, there might be further 
implications not considered here in allowing the contents of aVar to be used 
directly.) There wouldn't be much value in maintaining a fork for something 
like this, so I wanted to put that out here on the mailing list to see if it's 
something that might be considered for mainline.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11/11/19 12:30 PM, Jose Isaias Cabrera wrote:
> Igor Tandetnik, on Monday, November 11, 2019 11:02 AM, wrote...
>> On 11/11/2019 10:49 AM, Jose Isaias Cabrera wrote:
>>> So, yes, it's bulky, but, if you want to count characters in languages such 
>>> as
>>> Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert that 
>>> string
>>> to UTF32, and do a string count of that UTF32 variable.
>> Between ligatures and combining diacritics, the number of Unicode codepoints 
>> in a
>> string has little practical meaning. E.g. it is not necessarily correlated 
>> with the
>> width of the string as displayed on the screen or on paper; or with the 
>> number of
>> graphemes a human would say the string contains, if asked.
> That could be true, but have you tried to just display an specific number of 
> characters from an UTF8 string having Hebrew, Arabic, Chinese, Japanese (see 
> below).
>
>>> Most people have to figure out what Unicode they are using, count the 
>>> bytes, divide
>>> by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, 
>>> convert it to
>>> UTF32, and do a count.
>> And then what do you do with that count? What do you use it for?
> Say that I am writing a report and I only want to print the first 20 
> characters of a string, that would be something like,
> if (var.length> 20)
> {
>   writefln(var[0 .. 20]);
> }
> else
> {
>   writefln(var ~ "   "[0 .. 20]);
> }
> if var is declared UTF8, and there is a Chinese string or some multi-byte 
> language in that string, this will never print 20 Chinese characters. It will 
> print less.  If, I convert that UTF8 string to UTF32, then each multi-byte 
> character fits in one UTF32 character.  So,
>
> dchar[] var32 = std.utf.toUTF32(var);
> if (var32.length> 20)
> {
>   writefln(var32[0 .. 20]);
> }
> else
> {
>   writefln(var32 ~ cast(dchar[])"   "[0 .. 20]);
> }
>
> This will always print 20 characters, whether these are ASCII or multi-byte 
> language characters.  Thanks.
>
> josé
> ___

Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
screen.

One quick way to see this is that there is a need for NFD and NFC
representations, because some characters can be decomposed from a
combined character into a base character + a combining character, so a
string in NFD form may naturally 'compress' itself when being printed.
Then you need to remember that the one of reasons for providing the
combining characters was that it was decided that there would not be
created code points for all the possible composed characters, that many
would be expressed only as decomposed form of a base character +
combining character(s). Thus code-point count is not the same a output
glyph count. In fact, Unicode works hard at avoiding the term
'character' as it isn't well defined, often being thought as a glyph,
but also sometimes as a code-point.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

> On Nov 11, 2019, at 9:30 AM, Jose Isaias Cabrera  wrote:
> 
> Say that I am writing a report and I only want to print the first 20 
> characters of a string, that would be something like,

Hang on — why exactly 20 characters? Of text in an arbitrary language, which is 
to be displayed in an arbitrary font, with an arbitrary line width?

I don't know about you, but the only time I think about "exactly 20 characters" 
is when I'm writing to a terminal window. Most of the time that's ASCII, and 
even if it isn't, Asian characters are going to be double-width, and emojis 
might render an arbitrary number of code points to a single-width graphic. So 
"20 characters" still doesn't map to a fixed width onscreen.

If I'm writing a report, I more likely want to render only the first _line_ of 
a string, i.e. enough text to fill some number of points/millimeters/pixels of 
space, possibly minus an ellipsis. That is a job for a text rendering library, 
not `writeflen`.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


Richard Damon, on Monday, November 11, 2019 11:19 AM, wrote...

> UTF-32 is a reasonable internal operation format, if code-point
> operations are important. It does not make a good transmission format,

I agree.  That is why, I have not created any files for anything as UTF32 for 
delivery or anything personal. ;-)  It's bulky.  However, space is cheap now, 
and **I think** UTF32 will be a good uniform character set to use.  But, yes, 
we are far away from that idea.

> as it is usually takes more media than UTF-8 or UTF-16, and for
> transmission, the message size is important. The big issue is that
> code-point counting is rarely what you want, you generally want Glyph
> counting, which even UTF-32 doesn't provide.

Yes, agreed.  But, I was just pointing out that a name could be displayed as a 
symbol.


> But this shows that 'Unicode' doesn't handle the name, as is, which was
> the point of the rule, if you design you software just assuming that
> Unicode can handle all names, you will be very occasionally be wrong.
I was talking about the artist previously known as Prince.  I was trying to say 
that it would be feasible to insert that image/symbol as UTF32. It was more of 
a joke than pretending to have found the answer. :-)

josé


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] database disk image is malformed

On 11 Nov 2019, at 5:13pm, Jukka Marin  wrote:

> The main process first opens the databases and checks that their
> version matches that of the software and if not, the databases are
> closed and initialized by running a script.
> 
> After closing the databases, main process forks the children and
> all processes (including main process) open the databases and use
> their own connections.
> 
> What I was trying to ask was this:  If any of the children dies
> (a bug in the code), main process will restart the child.  At
> this point, the main process has the databases open, so the new
> child receives the connections as well.  What should I do now?

Okay, that gives us enough information to work with.

The conservative way to do it is to have the main process close the connection 
before forking and open it again.  Then, of course, the child processes make 
their own connections.

But I don't think that's necessary.  A child process can have access to the 
main process' database connection but ignore it.  So I think the main process 
can fork without closing its connection.  Then each child can never use that 
one but instead make its own.

Of course, every one of these connections needs to set a timeout.  And every 
call to the SQLite3 library needs to check its result code and make sure it is 
getting SQLITE_OK (or, for queries, SQLITE_DONE etc.).

> Should the child close the databases before opening them again?
> Will this close the databases for the main process as well?

As you suspected, closing the connection releases both memory structures and 
file handles.  Anything that tries to use that connection will then fail 
because it has no idea what it's talking to.

What puzzles me is this: you're getting "database malformed" and nothing you've 
described justifies this.  Assuming that this isn't just one old database which 
is genuinely corrupt, but that you are using a fresh uncorrupt database each 
time, you seem to have a genuine bug in your code.

This happens mostly because something is stomping on the memory assigned to a 
connection.  In your case, this probably means something is stomping on the 
memory assigned to one of the child processes.

So, first write yourself a quick script to use the shell tool to check the 
database for corruption.  Then run that, even while your program is running, 
and see if you can figure out whether your database really is corrupt or 
whether your program is getting spurious error messages.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] [draft patch] interface for retrieving values of bound parameters

2019-11-11 Thread tab

Hi all,

It'd be handy to be able to retrieve params previously bound to a statement in 
the C API. Per the advice on the SQLite copyright info page, this is much more 
of a suggestion than a full patch, though it is functional for binding and 
retrieving an sqlite_value* (but, for example, there might be further 
implications not considered here in allowing the contents of aVar to be used 
directly.) There wouldn't be much value in maintaining a fork for something 
like this, so I wanted to put that out here on the mailing list to see if it's 
something that might be considered for mainline.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

> On Nov 11, 2019, at 7:49 AM, Jose Isaias Cabrera  wrote:
> 
> if you want to count characters in languages such as Arabic, Hebrew, Chinese, 
> Japanese, etc., the easiest way is to convert that string to UTF32, and do a 
> string count of that UTF32 variable.

No, the easiest way is to ask your string class/library what the character 
count is, and let _it_ deal with the fiddly details. 

Or to consider why you need the character count in the first place — it’s 
usually not something that’s useful to know. Usually what you’re really asking 
is “how many pixels wide will this render?” or “how many bytes will this 
occupy?” or even “let me iterate over each character”.

At a low level, UTF-8 makes a lot more sense. It’s very compact, which is 
important for cache coherency as well as storage space. It’s upward compatible 
with ASCII, which is extremely convenient for text-based protocols / file 
formats / languages, and for working with legacy APIs (like !)

Modern libraries seem to be moving to UTF-8. For instance, Apple’s been 
migrating Swift’s string class from a legacy UTF-16 encoding to UTF-8, and 
playing up the consequent performance and space win. Go has been UTF-8 from the 
start. I don’t know of a single library that’s gone with UTF-32, except maybe 
as an option.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

Igor Tandetnik, on Monday, November 11, 2019 11:02 AM, wrote...
>
> On 11/11/2019 10:49 AM, Jose Isaias Cabrera wrote:
> > So, yes, it's bulky, but, if you want to count characters in languages such 
> > as
> > Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert that 
> > string
> > to UTF32, and do a string count of that UTF32 variable.
>
> Between ligatures and combining diacritics, the number of Unicode codepoints 
> in a
> string has little practical meaning. E.g. it is not necessarily correlated 
> with the
> width of the string as displayed on the screen or on paper; or with the 
> number of
> graphemes a human would say the string contains, if asked.

That could be true, but have you tried to just display an specific number of 
characters from an UTF8 string having Hebrew, Arabic, Chinese, Japanese (see 
below).

> > Most people have to figure out what Unicode they are using, count the 
> > bytes, divide
> > by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, 
> > convert it to
> > UTF32, and do a count.
>
> And then what do you do with that count? What do you use it for?

Say that I am writing a report and I only want to print the first 20 characters 
of a string, that would be something like,
if (var.length> 20)
{
  writefln(var[0 .. 20]);
}
else
{
  writefln(var ~ "   "[0 .. 20]);
}
if var is declared UTF8, and there is a Chinese string or some multi-byte 
language in that string, this will never print 20 Chinese characters. It will 
print less.  If, I convert that UTF8 string to UTF32, then each multi-byte 
character fits in one UTF32 character.  So,

dchar[] var32 = std.utf.toUTF32(var);
if (var32.length> 20)
{
  writefln(var32[0 .. 20]);
}
else
{
  writefln(var32 ~ cast(dchar[])"   "[0 .. 20]);
}

This will always print 20 characters, whether these are ASCII or multi-byte 
language characters.  Thanks.

josé
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11/11/19 12:13 PM, Simon Slavin wrote:
> On 11 Nov 2019, at 4:02pm, Igor Tandetnik  wrote:
>
>> And then what do you do with that count? What do you use it for?
> This is a key point.  When I started programming I used to do LEFT(A$(I), 14) 
> frequently.  But almost all of them were because I wanted to print the string 
> and had allocated 14 characters of space to in.
>
> Then came variable-width fonts.  The practise should have died out.  But 
> people are still doing it.
>
> There are other reasons to get the beginning of a string.  The first 
> character alone, especially.  There may be other reasons to get its length.  
> But it was mostly done because the length of the string was its width on the 
> display.  And it isn't any more.

And you do understand that getting the first 'character' (if you mean
the first printed glyph) of a string in UTF-32 is not trivial, because
it could easily be more than one code-point due to combining characters.
For many purposes it might not even include the first code-point, as
that might be a formatting meta-point like the BOM or a text-direction
code which should be skipped.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] database disk image is malformed

2019-11-11 Thread Jukka Marin

On Mon, Nov 11, 2019 at 05:03:25PM +, Simon Slavin wrote:
> On 11 Nov 2019, at 1:42pm, Jukka Marin  wrote:
> 
> > Or does the main process need to close all databases, then fork, then
> > reopen the databases?
> 
> Which processes access the databases ?  The main process ?  Its children ?  
> Are they all using the same connection ?
>  Are they all trying to use the same connection at the same time ?

All processes access the databases.  No, I changed the code so that
every process opens the databases separately, so they use their own
connections (at random times, so probably simultaneously).

The main process first opens the databases and checks that their
version matches that of the software and if not, the databases are
closed and initialized by running a script.

After closing the databases, main process forks the children and
all processes (including main process) open the databases and use
their own connections.

What I was trying to ask was this:  If any of the children dies
(a bug in the code), main process will restart the child.  At
this point, the main process has the databases open, so the new
child receives the connections as well.  What should I do now?
Should the child close the databases before opening them again?
Will this close the databases for the main process as well?

(One way is to stop using the databases in the main process, so
they are not passed to children, but this would be a major change
in the code.)

  -jm
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] database disk image is malformed

2019-11-11 Thread Shawn Wagner

Doing the latter - closing everything, forking, re-opening - is always
going to be safe. Or if the parent isn't going to use the connection, just
don't open the database until you're in the child after forking.

On Mon, Nov 11, 2019 at 8:08 AM Jukka Marin  wrote:

> On Fri, Nov 08, 2019 at 09:57:25AM +0200, Jukka Marin wrote:
> > On Thu, Nov 07, 2019 at 09:26:46AM -0800, Shawn Wagner wrote:
> > > This line stood out:
> > >
> > > > The main process opens the databases and then forks the other
> processes
> > > which can then perform database operations using the already opened
> > > databases.
> > >
> > > From
> > >
> https://sqlite.org/howtocorrupt.html#_carrying_an_open_database_connection_across_a_fork_
> > > :
> > >
> > > > Do not open an SQLite database connection, then fork(), then try to
> use
> > > that database connection in the child process. All kinds of locking
> > > problems will result and you can easily end up with a corrupt database.
> > > SQLite is not designed to support that kind of behavior. Any database
> > > connection that is used in a child process must be opened in the child
> > > process, not inherited from the parent.
> > >
> > > In this kind of situation, I usually use pthread_atfork() callbacks to
> > > automate closing databases and then re-opening them in the parent and
> child.
> >
> > Okay, thanks!  I suspected it could be something like this, but couldn't
> > find anything in the SQLite docs.
>
> In some situations, my main process will have the databases opened before
> it needs to fork a new child (this happens only if a child dies and
> has to be restarted).  If the child process immediately closes its copies
> of the databases and then reopens them, will it be safe?
>
> Or does the main process need to close all databases, then fork, then
> reopen the databases?
>
> Thanks again!
>
>   Jukka Marin
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11 Nov 2019, at 4:02pm, Igor Tandetnik  wrote:

> And then what do you do with that count? What do you use it for?

This is a key point.  When I started programming I used to do LEFT(A$(I), 14) 
frequently.  But almost all of them were because I wanted to print the string 
and had allocated 14 characters of space to in.

Then came variable-width fonts.  The practise should have died out.  But people 
are still doing it.

There are other reasons to get the beginning of a string.  The first character 
alone, especially.  There may be other reasons to get its length.  But it was 
mostly done because the length of the string was its width on the display.  And 
it isn't any more.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] database disk image is malformed

On 11 Nov 2019, at 1:42pm, Jukka Marin  wrote:

> Or does the main process need to close all databases, then fork, then
> reopen the databases?

Which processes access the databases ?  The main process ?  Its children ?  Are 
they all using the same connection ?  Are they all trying to use the same 
connection at the same time ?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11/11/19 10:49 AM, Jose Isaias Cabrera wrote:
>
> Richard Damon, on Monday, November 11, 2019 09:47 AM, wrote...
>> On 11/11/19 9:26 AM, Jose Isaias Cabrera wrote:
>>> Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
 On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on

> Not if the system uses UTF32. :-) You could put the pictograph in that 
> that textbox, and it'll work.
 Can you point to some description of this and how it works ?  I've never 
 heard of it.
>>> My point was that one could define the UTF32 [1] code for that specific 
>>> pictograph or glyph, and it'll work.
>>>
>>> josé
>>>
>>> [1] https://en.wikipedia.org/wiki/UTF-32
>> UTF-32 gives no encoding advantage over other Unicode formats, as all
>> allow expressing all the Unicode code points.
> I disagree.  I believe that the future is UTF32.  I will give you that it's 
> bulky, for example, here is the letter a written to a file in Windows-1252, 
> UTF8 signed, UTF16be signed, a UTF32be signed:
>
> bytes filename
> 1 0_Windows-1252.txt
> 4 1_UTF8signed.txt
> 4 2_UTF16BEsigned.txt
> 8 3_UTF32signed.txt
>
> So, yes, it's bulky, but, if you want to count characters in languages such 
> as Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert 
> that string to UTF32, and do a string count of that UTF32 variable.  Most 
> people have to figure out what Unicode they are using, count the bytes, 
> divide by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, 
> convert it to UTF32, and do a count.
UTF-32 is a reasonable internal operation format, if code-point
operations are important. It does not make a good transmission format,
as it is usually takes more media than UTF-8 or UTF-16, and for
transmission, the message size is important. The big issue is that
code-point counting is rarely what you want, you generally want Glyph
counting, which even UTF-32 doesn't provide.
>
>> There is no code-point assigned to the Pictogram for his name (As far as
>> I know), so their is no value you can put in represent it.
> You're right, but not that many people are changing their name to an image.  
> However, if two or three or more folks want to, there are enough empty UTF32 
> characters, that it can be accomplished.
But this shows that 'Unicode' doesn't handle the name, as is, which was
the point of the rule, if you design you software just assuming that
Unicode can handle all names, you will be very occasionally be wrong.
There are actually many more cases of this, I imagine a lot of
aboriginal people who have their own writing systems that haven't been
adopted by Unicode, have names (as their preferred name) that can't be
expressed in official Unicode. They may have a Government assigned
'official' name (if they have had to interact with the Government) that
can be represented, but that really isn't their name (Prince just had
the resources and gall to do it 'officially').
>
>
>> It would be possible to include in the application some way to add user
>> defined glyphs to the system fonts for user defined code points, and
>> then reconcile these when transferring data from one system to another.
> We have done this for special customer requirements and have assigned our own 
> UTF32 characters an specific design with our software.  But, yes, it's only 
> our software, but what if... a reconciliation can happen?
>
> josé
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] database disk image is malformed

2019-11-11 Thread Jukka Marin

On Fri, Nov 08, 2019 at 09:57:25AM +0200, Jukka Marin wrote:
> On Thu, Nov 07, 2019 at 09:26:46AM -0800, Shawn Wagner wrote:
> > This line stood out:
> > 
> > > The main process opens the databases and then forks the other processes
> > which can then perform database operations using the already opened
> > databases.
> > 
> > From
> > https://sqlite.org/howtocorrupt.html#_carrying_an_open_database_connection_across_a_fork_
> > :
> > 
> > > Do not open an SQLite database connection, then fork(), then try to use
> > that database connection in the child process. All kinds of locking
> > problems will result and you can easily end up with a corrupt database.
> > SQLite is not designed to support that kind of behavior. Any database
> > connection that is used in a child process must be opened in the child
> > process, not inherited from the parent.
> > 
> > In this kind of situation, I usually use pthread_atfork() callbacks to
> > automate closing databases and then re-opening them in the parent and child.
> 
> Okay, thanks!  I suspected it could be something like this, but couldn't
> find anything in the SQLite docs.

In some situations, my main process will have the databases opened before
it needs to fork a new child (this happens only if a child dies and
has to be restarted).  If the child process immediately closes its copies
of the databases and then reopens them, will it be safe?

Or does the main process need to close all databases, then fork, then
reopen the databases?

Thanks again!

  Jukka Marin
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


On 11/11/2019 10:49 AM, Jose Isaias Cabrera wrote:

So, yes, it's bulky, but, if you want to count characters in languages such as 
Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert that 
string to UTF32, and do a string count of that UTF32 variable.


Between ligatures and combining diacritics, the number of Unicode codepoints in 
a string has little practical meaning. E.g. it is not necessarily correlated 
with the width of the string as displayed on the screen or on paper; or with 
the number of graphemes a human would say the string contains, if asked.


Most people have to figure out what Unicode they are using, count the bytes, 
divide by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, 
convert it to UTF32, and do a count.


And then what do you do with that count? What do you use it for?
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

Richard Damon, on Monday, November 11, 2019 09:47 AM, wrote...
>
> On 11/11/19 9:26 AM, Jose Isaias Cabrera wrote:
> > Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
> >> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
> >>
> >>> Not if the system uses UTF32. :-) You could put the pictograph in that 
> >>> that textbox, and it'll work.
> >> Can you point to some description of this and how it works ?  I've never 
> >> heard of it.
> > My point was that one could define the UTF32 [1] code for that specific 
> > pictograph or glyph, and it'll work.
> >
> > josé
> >
> > [1] https://en.wikipedia.org/wiki/UTF-32
>
> UTF-32 gives no encoding advantage over other Unicode formats, as all
> allow expressing all the Unicode code points.

I disagree.  I believe that the future is UTF32.  I will give you that it's 
bulky, for example, here is the letter a written to a file in Windows-1252, 
UTF8 signed, UTF16be signed, a UTF32be signed:

bytes filename
1 0_Windows-1252.txt
4 1_UTF8signed.txt
4 2_UTF16BEsigned.txt
8 3_UTF32signed.txt

So, yes, it's bulky, but, if you want to count characters in languages such as 
Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert that 
string to UTF32, and do a string count of that UTF32 variable.  Most people 
have to figure out what Unicode they are using, count the bytes, divide by... 
and on, and on.  Not me, I just take that UTF8, or UTF16 string, convert it to 
UTF32, and do a count.

> There is no code-point assigned to the Pictogram for his name (As far as
> I know), so their is no value you can put in represent it.

You're right, but not that many people are changing their name to an image.  
However, if two or three or more folks want to, there are enough empty UTF32 
characters, that it can be accomplished.

> It would be possible to include in the application some way to add user
> defined glyphs to the system fonts for user defined code points, and
> then reconcile these when transferring data from one system to another.

We have done this for special customer requirements and have assigned our own 
UTF32 characters an specific design with our software.  But, yes, it's only our 
software, but what if... a reconciliation can happen?

josé
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-11 Thread Dan Kennedy



On 8/11/62 00:15, Dennis Clarke wrote:

On 2019-11-07 11:44, Shawn Wagner wrote:
... Just don't use strict c99 mode when compiling with gcc? Drop the 
-std

argument from your CFLAGS to use the default (gnu11 since gcc 5) or
explicitly use gnu99, which gives you that version of the C standard 
+ gcc

extensions.

(Not that they have anything to do with the problem, but compiling 
with -O0

and -fno-builtin are strange unless you're planning on spending some
quality time in a debugger stepping through code, and -malign-double is
already the default on x86-64 so kind of pointless)



Debugger .. yes. That will happen and I build on a multitude of
platforms.

OKay so the code fails on Solaris sparc with c99 whereas in the recent
past it all builds fine :

libtool: compile:  /opt/developerstudio12.6/bin/c99 
-I/usr/local/include -D_TS_ERRNO -D_POSIX_PTHREAD_SEMANTICS 
-D_LARGEFILE64_SOURCE -Xc -m64 -xarch=sparc -g -errfmt=error 
-errshort=full -xstrconst -xildoff -xmemalign=8s -xnolibmil 
-xcode=pic32 -xregs=no%appl -xlibmieee -mc -ftrap=%none 
-xbuiltin=%none -xunroll=1 -xs -xdebugformat=dwarf -errtags=yes 
-errwarn=%none -erroff=%none -DSQLITE_OS_UNIX=1 -I. 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/rtree 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/icu 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/fts3 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/async 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/session 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/userauth 
-D_HAVE_SQLITE_CONFIG_H -DBUILD_sqlite -DNDEBUG -I/usr/local/include 
-DSQLITE_THREADSAFE=1 -DSQLITE_HAVE_ZLIB=1 -DUSE_TCL_STUBS=1 -c 
/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c 
 -KPIC -DPIC -o .libs/tclsqlite.o
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2346: error: undefined symbol: SQLITE_DBCONFIG_ENABLE_VIEW
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2346: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2351: error: undefined symbol: SQLITE_DBCONFIG_TRIGGER_EQP
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2351: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2352: error: undefined symbol: SQLITE_DBCONFIG_RESET_DATABASE
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2352: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2353: error: undefined symbol: SQLITE_DBCONFIG_DEFENSIVE
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2353: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2354: error: undefined symbol: SQLITE_DBCONFIG_WRITABLE_SCHEMA
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2354: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2355: error: undefined symbol: SQLITE_DBCONFIG_LEGACY_ALTER_TABLE
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2355: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2356: error: undefined symbol: SQLITE_DBCONFIG_DQS_DML
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2356: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2357: error: undefined symbol: SQLITE_DBCONFIG_DQS_DDL
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2357: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2855: error: undefined symbol: SQLITE_DIRECTONLY
c99: acomp failed for 
/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c

gmake: *** [Makefile:1029: tclsqlite.lo] Error 1



On Red Hat Enterprise Linux 7.4 the code actually does compile and then
core dumps with a segfault from with that same source file :

Time: walshared.test 24 ms
# WARNING: This next test takes around 12 seconds
gmake: *** [Makefile:1256: tcltest] Segmentation fault (core dumped)



This is almost certainly an issue with the test scripts, not the 
library. Can you post the last 100 lines or so of the file 
"test-out.txt" that was created in the cwd by the [make quicktest] or 
whatever you ran to get this?


Thanks,

Dan.



___

Re: [sqlite] Things you shouldn't assume when you store names

On 11/11/19 9:26 AM, Jose Isaias Cabrera wrote:
> Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
>> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
>>
>>> Not if the system uses UTF32. :-) You could put the pictograph in that that 
>>> textbox, and it'll work.
>> Can you point to some description of this and how it works ?  I've never 
>> heard of it.
> My point was that one could define the UTF32 [1] code for that specific 
> pictograph or glyph, and it'll work.
>
> josé
>
> [1] https://en.wikipedia.org/wiki/UTF-32

UTF-32 gives no encoding advantage over other Unicode formats, as all
allow expressing all the Unicode code points.

There is no code-point assigned to the Pictogram for his name (As far as
I know), so their is no value you can put in represent it.

There are a number of code points reserved for user definition, but many
of those have been informally reserved for characters no yet put into
Unicode.

It would be possible to include in the application some way to add user
defined glyphs to the system fonts for user defined code points, and
then reconcile these when transferring data from one system to another.

Another option would be to define some user defined code point pair as a
graphics escape, and put within it an encoding of a graphics file
containing the glyph, but at that point you are really outside of being
'Unicode'

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names


Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
>
> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
>
> > Not if the system uses UTF32. :-) You could put the pictograph in that that 
> > textbox, and it'll work.
>
> Can you point to some description of this and how it works ?  I've never 
> heard of it.

My point was that one could define the UTF32 [1] code for that specific 
pictograph or glyph, and it'll work.

josé

[1] https://en.wikipedia.org/wiki/UTF-32
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera  wrote:

> Not if the system uses UTF32. :-) You could put the pictograph in that that 
> textbox, and it'll work.

Can you point to some description of this and how it works ?  I've never heard 
of it.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names