Re: [gull] Truc et astuces: Lire un CSV en bash

Daniel Cordey Mon, 18 Oct 2021 02:02:59 -0700


On 18.10.21 09:42, felix wrote:

Ça f'sait longtemps que je n'ai pas posté un truc...


Il s'agit de lire un fichier CSV conforme au
  RFC 4180 Common Format and MIME Type for Comma-Separated Values (CSV) Files

Bon... S'il existe un RFC, celui-ci laisse aussi la porte ouverteconcernant les "implémentations" qui peuvent varier et qui n'entrentdonc pas dans la définition de ce standard. Entre autre, la doc du RFC4180 dit clairement :


Interoperability considerations:

      Due to lack of a single specification, there are considerable
      differences among implementations.  Implementors should "be
      conservative in what you do, be liberal in what you accept from
      others" (RFC 793  <https://datatracker.ietf.org/doc/html/rfc793>  [8  
<https://datatracker.ietf.org/doc/html/rfc4180#ref-8>]) when processing CSV files.  An 
attempt at a
      common definition can be found inSection 2  
<https://datatracker.ietf.org/doc/html/rfc4180#section-2>.

Aussi, la documentation du module bash CSV dit aussi :

*This method is recommended only for simple CSV files*with no text fields containing extra comma|,|delimiter, or return lines. For more complex CSV support, see the nextsection toparse CSV with AWKicon mdi-link-variant <https://www.shell-tips.com/bash/how-to-parse-csv-file/#using-the-awk-command-line>.


J'avais d'ailleurs immédiatement pensé à AWK lorsque j'ai vu ton exemple :-)

En regardant plus en détail le module CSV de Python, j'ai trouvé ce quisuit :


/class/|csv.||Dialect|

   The|Dialect|
   <https://docs.python.org/3/library/csv.html#csv.Dialect>class is a
   container class whose attributes contain information for how to
   handle doublequotes, whitespace, delimiters, etc. Due to the lack of
   a strict CSV specification, different applications produce subtly
   different CSV data.|Dialect|
   <https://docs.python.org/3/library/csv.html#csv.Dialect>instances
   define how|reader|
   <https://docs.python.org/3/library/csv.html#csv.reader>and|writer|
   <https://docs.python.org/3/library/csv.html#csv.writer>instances behave.

   All available|Dialect|
   <https://docs.python.org/3/library/csv.html#csv.Dialect>names are
   returned by|list_dialects()|
   <https://docs.python.org/3/library/csv.html#csv.list_dialects>, and
   they can be registered with specific|reader|
   <https://docs.python.org/3/library/csv.html#csv.reader>and|writer|
   <https://docs.python.org/3/library/csv.html#csv.writer>classes
   through their initializer (|__init__|) functions like this:

   import  csv

   with  open('students.csv',  'w',  newline='')  as  csvfile:
        writer  =  csv.writer(csvfile,  dialect='unix')
                                     ^^^^^^^^^^^^^^

Ma conclusion est qu'une implémentation en Python nous mettra mieux àl'abri des variantes de CSV, en étant capable d'écrire une code pluslisible et évitant de gonfler le code pour traiter les cas spéciaux.

Depuis un moment, on peut utiliser des modules chargeables dans bash, dans
l'arborescence de la distribution, il y a un dossier d'exemples, avec
de nombreux modules chargeables.

Merci pour cette info. En effet, ces commandes sont exécutées plusrapidement que celles qui se trouvent dans /usr/bin en évitant de faireun fork/exec. C'est donc un vrai plus au niveau performance, en plusd'apporter des commandes qui ne se trouvent justement pas en standard(comme csv).

dc

_______________________________________________
gull mailing list
[email protected]
https://forum.linux-gull.ch/mailman/listinfo/gull

Re: [gull] Truc et astuces: Lire un CSV en bash

Répondre à