[NTG-context] Re: Deutschland-Stack

2026-03-23 Thread Bruce Horrocks


> On 23 Mar 2026, at 09:19, Hans Hagen via ntg-context  
> wrote:
> 
> On 3/23/2026 1:05 AM, Bruce Horrocks wrote:
>>> On 22 Mar 2026, at 18:11, luigi scarso  wrote:
>>> 
>>> 
>>> While other "acronyms" seem reasonable, in
>>> JSON, XML und CSV als Datenformate,
>>> CSV should be deprecated. I understand that many use CSV and that, if 
>>> implemented correctly, it doesn't cause problems,
>>> but it's precisely the "correct" implementation that has become difficult 
>>> to find.
>> CSV is a standard - RFC4180 - and therefore should not be deprecated.
>>> I would propose the ucsv format, Unicode Controls Separated Values,
>>> where fields are separated by 001F (insteead of ',')  and records by 001E 
>>> (instead of newline)
>>> or alternatively 241F and 241E. These characters should not appear in the 
>>> text, to make the parser very simple.
>> The CSV standard is a truly awful one in that it encodes common and 
>> historical practice rather than starting with a clean set of requirements, 
>> so I'm definitely in favour of *adding* ASCII/Unicode 21-31 (FS, GS, RS & 
>> US) as a separately supported type.
> 
> and
> 
> 0x2 : START OF TEXT
> 0x3 : END OF TEXT

Start of text (STX) is/was used for Telex type transmissions to teleprinters. 
SOH indicates the start of header which has routing information, then STX 
indicates the start of the actual message with ETX (End of Text) marking the 
end of the message.

For quotes, newlines etc you only need worry about US (Unit Separator) [1] as 
anything greater than ASCII/UNICODE 32 is part of the unit's (i.e. field's) 
value, and US is not a valid character for inclusion in a field so there is no 
need to escape it.

[1] Note FS, GS & RS can also terminate a unit so you do have to worry about 
those a bit!

> 
> instead of quotes (which we happen to support in luametatex's mplib variant 
> so that embedded quotes and newlines work fine)
> 
>>   $ sqlite3 -header -csv my_db.db "select * from my_table;" > out.csv
> 
> hm, i need to test that
> 
> Hans

I cut and pasted that from a Stack Overflow answer - now I look at it again I 
think the quotes will be wrong, on Unix at least, because the asterisk will be 
expanded by the shell.

  $ sqlite3 -header -csv my_db.db 'select * from my_table;' > out.csv

would be safer.

Regards,
—
Bruce Horrocks
Hampshire, UK

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___


[NTG-context] Re: Deutschland-Stack

2026-03-23 Thread luigi scarso
On Mon, 23 Mar 2026 at 10:22, Hans Hagen via ntg-context 
wrote:

>
> >$ sqlite3 -header -csv my_db.db "select * from my_table;" > out.csv
>
> hm, i need to test that
>
>
.mode ascii is quite flexible

https://www.sqlite.org/cli.html#changing_output_formats
"""
To import data with arbitrary delimiters and no quoting, first set ascii
mode (".mode ascii"), then set the field and record delimiters using the
".separator" command. This will suppress dequoting. Upon ".import", the
data will be split into fields and records according to the delimiters so
specified.
"""
I think that the ascii mode supports only 8bits  value, default
ascii   Columns/rows delimited by 0x1F and 0x1E

With the list mode you can set the .separator for field and record , i.e.
sqlite> .separator " ✨ " "🕺\r\n\r\n"
is ok.

--
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___


[NTG-context] Re: Deutschland-Stack

2026-03-23 Thread Hans Hagen via ntg-context

On 3/23/2026 1:05 AM, Bruce Horrocks wrote:




On 22 Mar 2026, at 18:11, luigi scarso  wrote:


While other "acronyms" seem reasonable, in
JSON, XML und CSV als Datenformate,
CSV should be deprecated. I understand that many use CSV and that, if 
implemented correctly, it doesn't cause problems,
but it's precisely the "correct" implementation that has become difficult to 
find.


CSV is a standard - RFC4180 - and therefore should not be deprecated.


I would propose the ucsv format, Unicode Controls Separated Values,
where fields are separated by 001F (insteead of ',')  and records by 001E 
(instead of newline)
or alternatively 241F and 241E. These characters should not appear in the text, 
to make the parser very simple.


The CSV standard is a truly awful one in that it encodes common and historical 
practice rather than starting with a clean set of requirements, so I'm definitely 
in favour of *adding* ASCII/Unicode 21-31 (FS, GS, RS & US) as a separately 
supported type.


and

0x2 : START OF TEXT
0x3 : END OF TEXT

instead of quotes (which we happen to support in luametatex's mplib 
variant so that embedded quotes and newlines work fine)



   $ sqlite3 -header -csv my_db.db "select * from my_table;" > out.csv


hm, i need to test that

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___


[NTG-context] Re: Deutschland-Stack

2026-03-23 Thread Hans Hagen via ntg-context

On 3/23/2026 10:01 AM, Hraban Ramm wrote:

Am 22.03.26 um 19:11 schrieb luigi scarso:


While other "acronyms" seem reasonable, in
JSON, XML und CSV als Datenformate,
CSV should be deprecated. I understand that many use CSV and that, if 
implemented correctly, it doesn't cause problems,
but it's precisely the "correct" implementation that has become 
difficult to find.

I would propose the ucsv format, Unicode Controls Separated Values,
where fields are separated by 001F (insteead of ',')  and records by 
001E (instead of newline)
or alternatively 241F and 241E. These characters should not appear in 
the text, to make the parser very simple.


(Even better would be Sqlite As An Application File Format
https://www.sqlite.org/appfileformat.html
but that's just my opinion.)


Binary formats are never a solution for simple requirements.


that fluid ... ascii and utf is also binary .. as long as it is clear 
what the bytes mean al is fine


CSV is simple and doesn’t need special input methods for uncommon 
characters.


It’s not about parsability / machine readability, but also about human 
editability.


and then one lets humans loose on the content and inconsistency spoils 
everything (and applications need to fix things)


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___


[NTG-context] Re: Deutschland-Stack

2026-03-23 Thread Hraban Ramm

Am 22.03.26 um 19:11 schrieb luigi scarso:


While other "acronyms" seem reasonable, in
JSON, XML und CSV als Datenformate,
CSV should be deprecated. I understand that many use CSV and that, if 
implemented correctly, it doesn't cause problems,
but it's precisely the "correct" implementation that has become 
difficult to find.

I would propose the ucsv format, Unicode Controls Separated Values,
where fields are separated by 001F (insteead of ',')  and records by 
001E (instead of newline)
or alternatively 241F and 241E. These characters should not appear in 
the text, to make the parser very simple.


(Even better would be Sqlite As An Application File Format
https://www.sqlite.org/appfileformat.html
but that's just my opinion.)


Binary formats are never a solution for simple requirements.

CSV is simple and doesn’t need special input methods for uncommon 
characters.


It’s not about parsability / machine readability, but also about human 
editability.


Hraban

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___


[NTG-context] Re: Deutschland-Stack

2026-03-22 Thread Bruce Horrocks


> On 22 Mar 2026, at 18:11, luigi scarso  wrote:
> 
> 
> While other "acronyms" seem reasonable, in
> JSON, XML und CSV als Datenformate,
> CSV should be deprecated. I understand that many use CSV and that, if 
> implemented correctly, it doesn't cause problems, 
> but it's precisely the "correct" implementation that has become difficult to 
> find.

CSV is a standard - RFC4180 - and therefore should not be deprecated.

> I would propose the ucsv format, Unicode Controls Separated Values,
> where fields are separated by 001F (insteead of ',')  and records by 001E 
> (instead of newline)
> or alternatively 241F and 241E. These characters should not appear in the 
> text, to make the parser very simple.

The CSV standard is a truly awful one in that it encodes common and historical 
practice rather than starting with a clean set of requirements, so I'm 
definitely in favour of *adding* ASCII/Unicode 21-31 (FS, GS, RS & US) as a 
separately supported type.

> (Even better would be Sqlite As An Application File Format
> https://www.sqlite.org/appfileformat.html
> but that's just my opinion.)

It's easy enough to convert sqlite to CSV on the command line:

  $ sqlite3 -header -csv my_db.db "select * from my_table;" > out.csv

so you can use that in the interim. You can always use a Makefile to keep the 
.csv files updated.

Regards,
—
Bruce Horrocks
Hampshire, UK

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___


[NTG-context] Re: Deutschland-Stack

2026-03-22 Thread luigi scarso
On Sun, 22 Mar 2026 at 11:46, Hans Hagen via ntg-context 
wrote:

> On 3/22/2026 10:00 AM, luigi scarso wrote:
> > https://deutschland-stack.gov.de/ 
>
> it's not like we suddenly have these tools, at least tex has been around
> for a while; so governments suddenly 'becoming aware' always makes me
> wonder (they jo-jo anyway between solutions)
>
> even if e.g. luatex is used in some places that doesn't mean that it
> translates in interest (or dev support) ... it's seen as a given
>
> that said: given the current political landcape and how it evolves (just
> look at various european elections) i wonder if i should even look at
> all these initiatives and policies .. in the end it's also (either or
> not national) big-tech that benefits most from this, not those who
> invested their 'free' time in development etc on the long term
> tex-timescale
>
> > It would be nice to have at least an English translation..
> > Anyway , looking for some tex related thing
> >
> > https://deutschland-stack.gov.de/gesamtbild/  > stack.gov.de/gesamtbild/>
> > :
> > ODF und PDF/UA als Dokumentenformate,
>
> just look at history (over your own lifetime) ... how long will that UA
> stuff last (it is already a mess)
>
>


While other "acronyms" seem reasonable, in
JSON, XML und CSV als Datenformate,
CSV should be deprecated. I understand that many use CSV and that, if
implemented correctly, it doesn't cause problems,
but it's precisely the "correct" implementation that has become difficult
to find.
I would propose the ucsv format, Unicode Controls Separated Values,
where fields are separated by 001F (insteead of ',')  and records by 001E
(instead of newline)
or alternatively 241F and 241E. These characters should not appear in the
text, to make the parser very simple.

(Even better would be Sqlite As An Application File Format
https://www.sqlite.org/appfileformat.html
but that's just my opinion.)

--
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___


[NTG-context] Re: Deutschland-Stack

2026-03-22 Thread Hans Hagen via ntg-context

On 3/22/2026 10:00 AM, luigi scarso wrote:

https://deutschland-stack.gov.de/ 


it's not like we suddenly have these tools, at least tex has been around 
for a while; so governments suddenly 'becoming aware' always makes me 
wonder (they jo-jo anyway between solutions)


even if e.g. luatex is used in some places that doesn't mean that it 
translates in interest (or dev support) ... it's seen as a given


that said: given the current political landcape and how it evolves (just 
look at various european elections) i wonder if i should even look at 
all these initiatives and policies .. in the end it's also (either or 
not national) big-tech that benefits most from this, not those who 
invested their 'free' time in development etc on the long term 
tex-timescale



It would be nice to have at least an English translation..
Anyway , looking for some tex related thing

https://deutschland-stack.gov.de/gesamtbild/ 

:
ODF und PDF/UA als Dokumentenformate,


just look at history (over your own lifetime) ... how long will that UA 
stuff last (it is already a mess)


think of it like this: if we could do something 10 years ago, and no one 
was interested, why take interest serious now


using tex is very much about persistence, individual interest in control 
and quality over typesetting, to some extend (single users, few person 
orgaizations) involvement, curiosity, experimenting, adapting, so that 
is what we should focus on



not yet, but after a bit of searching
https://gitlab.opencode.de/open-code/document-writing-tools/document- 
writing-ci-components/-/blob/v2/scripts/Dockerfile.latex?ref_type=heads 


:
COPY ./install_texlive.sh /tmp/install_texlive.sh
RUN chmod +x /tmp/install_texlive.sh
RUN /tmp/install_texlive.sh
RUN ln -s /usr/local/texlive/2025/bin/x86_64-linux/lualatex /usr/local/ 
lualatex
RUN /usr/local/texlive/2025/bin/x86_64-linux/tlmgr install tagpdf 
tocloft textpos


Better than nothing, I should say.


anyone could slap together something like that ... tex has been around 
for ages and always kind of adapted ... all that wrapper stuff and these 
fashions come and go ... different worlds imo


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___