Re: RFC: The future of Text::CSV_XS

H.Merijn Brand Fri, 25 May 2007 12:53:51 -0700

On Fri, 25 May 2007 15:22:02 -0400, "Richard Dice" <[EMAIL PROTECTED]>
wrote:


> Merijn,

Why no Cc: to the list?

> Thanks for asking, and for your work on this.  (Looks like you just took
> over maintainership recently...?)

Yes.

> My recent "I wish Text::CSV_XS could handle X..." experience was -
> 
>    - Save an Excel spreadsheet to CSV format

My Spreadsheet::Read module on CPAN includes a utility that does just that:

  # xlscat -c file.xls >file.csv

>    - But some of the cells in the Excel spreadsheet contained line breaks

Shouldn't matter

>    - So iterating line-by-line through the file in order to have lines to
>      parse with Text::CSV_XS meant that any line derived from a row in Excel
>      containing a cell containing a line break would fail
> 
> That new feature idea regarding reading the whole file at once might be a
> good place to address this.

Don't think so, but feel free to enlighten me on the reasoning you have

> Other features that could be nice -
> 
>    - given a file, tell it whether it has a header row and if so provide
>      a hash-key-style interface on each row per the names in columns of the
>      header row

Could be one of the options to the suggested

   parse_file ($file, { cols => [ ...]. has_header_row => 1 });

causing the construct of 

   { fields => [ .... ],

to change to

   { fields => { Name = "...", Address => "...", ... },

but I think that would be a huge impact on memory use and also be
quite easy to create yourself in a map {} construct;

>    - have it return how many rows and columns there are in the file

  # xlscat -i file.csv

I don't think that kind of functionality should be in the low level
that this module lives in. Consider that reading CSV has no defined
way to jump back in the data stream, so once you've read the data,
you cannot go back. It has no random access structure like Excel.

>    - ability to automatically ignore trailing (and perhaps leading) empty
>      rows

Also an option in xlscat

>    - provide a "best guess" count of how many columns there _should_ be
>      in a row, based on the header row (if present) and/or general agreement
>      amongst the other rows in the file (if 99 have 14 columns in a row and 1 
> has
>      10 columns, that 1 could is likely an outlier)

Nice example. I like that. Should not be in the module itself, but could
be a file file in the examples/ folder.

>    - In the event of rows with fewer columns than the best-guess (or a
>      user-defined number of how many columns there should be) then provide
>      extra undef column (array) values

I would say you use Spreadsheet::Read and do it in that framework.

>    - ability to extract a row/column range, e.g. columns 2 through 7 in
>      rows 3 through 13

You defenitely want xlscat :) Both supported as options

/home/merijn 101 > xlscat --help
usage: xlscat [-s <sep>] [-L] [-u] [ Selection ] file.xls
              [-c | -m]       [-u] [ Selection ] file.xls
               -i                  [ -S sheets ] file.xls
    Generic options:
       -v[#]       Set verbose level (xlscat)
       -d[#]       Set debug   level (Spreadsheet::Read)
       -u          Use unformatted values
       --noclip    Do not strip empty sheets and
                   trailing empty rows and columns
    Input CSV:
       --in-sep=c  Set input sep_char for CSV
    Output Text (default):
       -s <sep>    Use separator <sep>. Default '|', \n allowed
       -L          Line up the columns
    Output Index only:
       -i          Show sheet names and size only
    Output CSV:
       -c          Output CSV, separator = ','
       -m          Output CSV, separator = ';'
    Selection:
       -S <sheets> Only print sheets <sheets>. 'all' is a valid set
                   Default only prints the first sheet
       -R <rows>   Only print rows    <rows>. Default is 'all'
       -C <cols>   Only print columns <cols>. Default is 'all'
       -F <flds>   Only fields <flds> e.g. -FA3,B16
/home/merijn 102 >

> You planning on being at YAPC::EU?  Maybe I'll run into you there.

Yes, and planning to talk about another (new) module. I've
already been registered.

-- 
H.Merijn Brand         Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.9.x   on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.0 & 10.2, AIX 4.3 & 5.2, and Cygwin. http://qa.perl.org
http://mirrors.develooper.com/hpux/            http://www.test-smoke.org
                        http://www.goldmark.org/jeff/stupid-disclaimers/

Re: RFC: The future of Text::CSV_XS

Reply via email to