John Redford <[email protected]> writes:
...
> I didn't attend, and perhaps someone already made this point, but it
> often seems to be lost in Perl and Perl-like communities where
> sequential IO is considered the only kind of IO, and memory
> allocation is hidden from the developer.
>
> There are two generally significant "wins" for fixed-record formats.
>
> 1. Static allocation. Records can be read without reallocating or
> wasting memory. This is faster/better for obvious reasons.  Many
> languages make it possible to map a fixed structure over the memory
> used, so inner elements can be accessed conveniently -- and unions
> make it possible to access conditional content nicely.
>
> 2. Dynamic record access. When data is stored on a filesystem, mmap/
> CreateFileMapping can be used to map the record-data and offsets can
> index to an arbitrary record directly.
>
> There are many languages that can take advantage of these factors. 
> Perl is bad at both.
>
> Using a format like XML or JSON can be nice in other ways.  But it's
> going to be less efficient and it's going to require full parsing of
> an entire document (barring exceptional exit of a streaming parse)
> and dynamic allocation of memory.
>
> There is no "better" choice -- only "better for the situation".  (But
> if you define the "situation" as "I want to use Perl", then you're
> starting with a narrow viewpoint.)

Maybe I should point out that Uri's wording of this preference expressed
no great love for XML.

Also, this fixed length format he dealt with, and the ones where I am
though not as bad I think, was strange in ways that went way beyond just
using fixed length records (e.g. embedded back references to other
records).

I'm rather less "in the Perl community" than I would like to be, and the
preference for XML over fixed length records here came from C++
programmers. Not disagreeing with your points (I'll leave to others
whether Perl is inherently unsuited to imitate COBOL) on fixed record
file advantages, only for these people XML had a strong draw, even with
potentially large amounts of data, because it eased debugging bad files,
was somewhat more self-documenting, and had ways of doing some basic
validation, without extra code, using DTDs then schemas (okay, also in
those days "doing XML" was some kind of weird marketing position). It's
a royal pain to be given a bad file and have to pull up some excel
spreadsheet, pick out the 17th tab, in the 5th subgrouping, in the 47th
column to read off, "Invoice amount, offset 137, length 9," and then
type ctrl-u 137 ctrl-f in emacs to get to the right place and then mark
off your region 9 places so the number does blur in with the next
one. But yeah, it's not small and programs reading it aren't so quick.

> As for "standards"... There's no such thing.  Performance is real --
> standards are just agendas.  If you want to talk to X, then you'll
> speak what X speaks.  Period.  You can tell X that "French is the
> language of diplomacy" or "XML is the language of the future", but
> when X ignores your XML & French, you can either speak what X wants,
> or not talk to X.  Standards are just what the powerful call their
> own preferences.

I'm just wondering which powerful people had it out in these standards
in this particular case (my case not Uri's, his fixed record file seemed
to have a standard). Cause if it was the powerful people who buy the
software needing to read the EDI providers' output files, they should
have standardized that instead of the EDI providers' input
(i.e. whatever flows across these proprietary networks). It would have
saved a lot of work. Then again, I work with someone whose whole job is
dealing with these things, and she probably enjoys getting paid, so...

-- 
Mike Small
[email protected]

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to