On Fri, 25 May 2007 15:22:02 -0400, "Richard Dice" <[EMAIL PROTECTED]>
wrote:
> Merijn,
Why no Cc: to the list?
> Thanks for asking, and for your work on this. (Looks like you just took
> over maintainership recently...?)
Yes.
> My recent "I wish Text::CSV_XS could handle X..." experience was -
>
> - Save an Excel spreadsheet to CSV format
My Spreadsheet::Read module on CPAN includes a utility that does just that:
# xlscat -c file.xls >file.csv
> - But some of the cells in the Excel spreadsheet contained line breaks
Shouldn't matter
> - So iterating line-by-line through the file in order to have lines to
> parse with Text::CSV_XS meant that any line derived from a row in Excel
> containing a cell containing a line break would fail
>
> That new feature idea regarding reading the whole file at once might be a
> good place to address this.
Don't think so, but feel free to enlighten me on the reasoning you have
> Other features that could be nice -
>
> - given a file, tell it whether it has a header row and if so provide
> a hash-key-style interface on each row per the names in columns of the
> header row
Could be one of the options to the suggested
parse_file ($file, { cols => [ ...]. has_header_row => 1 });
causing the construct of
{ fields => [ .... ],
to change to
{ fields => { Name = "...", Address => "...", ... },
but I think that would be a huge impact on memory use and also be
quite easy to create yourself in a map {} construct;
> - have it return how many rows and columns there are in the file
# xlscat -i file.csv
I don't think that kind of functionality should be in the low level
that this module lives in. Consider that reading CSV has no defined
way to jump back in the data stream, so once you've read the data,
you cannot go back. It has no random access structure like Excel.
> - ability to automatically ignore trailing (and perhaps leading) empty
> rows
Also an option in xlscat
> - provide a "best guess" count of how many columns there _should_ be
> in a row, based on the header row (if present) and/or general agreement
> amongst the other rows in the file (if 99 have 14 columns in a row and 1
> has
> 10 columns, that 1 could is likely an outlier)
Nice example. I like that. Should not be in the module itself, but could
be a file file in the examples/ folder.
> - In the event of rows with fewer columns than the best-guess (or a
> user-defined number of how many columns there should be) then provide
> extra undef column (array) values
I would say you use Spreadsheet::Read and do it in that framework.
> - ability to extract a row/column range, e.g. columns 2 through 7 in
> rows 3 through 13
You defenitely want xlscat :) Both supported as options
/home/merijn 101 > xlscat --help
usage: xlscat [-s <sep>] [-L] [-u] [ Selection ] file.xls
[-c | -m] [-u] [ Selection ] file.xls
-i [ -S sheets ] file.xls
Generic options:
-v[#] Set verbose level (xlscat)
-d[#] Set debug level (Spreadsheet::Read)
-u Use unformatted values
--noclip Do not strip empty sheets and
trailing empty rows and columns
Input CSV:
--in-sep=c Set input sep_char for CSV
Output Text (default):
-s <sep> Use separator <sep>. Default '|', \n allowed
-L Line up the columns
Output Index only:
-i Show sheet names and size only
Output CSV:
-c Output CSV, separator = ','
-m Output CSV, separator = ';'
Selection:
-S <sheets> Only print sheets <sheets>. 'all' is a valid set
Default only prints the first sheet
-R <rows> Only print rows <rows>. Default is 'all'
-C <cols> Only print columns <cols>. Default is 'all'
-F <flds> Only fields <flds> e.g. -FA3,B16
/home/merijn 102 >
> You planning on being at YAPC::EU? Maybe I'll run into you there.
Yes, and planning to talk about another (new) module. I've
already been registered.
--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.9.x on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.0 & 10.2, AIX 4.3 & 5.2, and Cygwin. http://qa.perl.org
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org
http://www.goldmark.org/jeff/stupid-disclaimers/